Re: Phaistos in ConScript

2002-07-08 Thread Doug Ewell

Michael Everson  wrote:

> Say that we found another Phaistos document with the same string in
> it, and were able to decipher Phaistos, and found that the string
> matched in meaning and syntax to what's on the disk. Then we would
> have a superfluous character encoded.

You mean like U+0340 and U+0341?

(ducking and running)





RE: Phaistos in ConScript

2002-07-08 Thread Michael Everson

At 15:51 -0700 2002-07-08, Asmus Freytag wrote:
>At 02:43 PM 7/8/02 +0100, Michael Everson wrote:
>>Godart says "The last sign of set A:VIII was not deleted but broke 
>>off with a sliver of clay. Bearing mind the space and outline of 
>>the gap, which seems to roughtly follow the outline of the broken 
>>sign, it seems that the most plausible identification of the 
>>mysterious sign is a 3 [TATTOOED HEAD] or a 20 [DOLIUM], unless it 
>>is an 8 [GAUNTLET] or a 4 [CAPTIVE], which is less likely." I don't 
>>want to encode a new character without better evidence (and 
>>wouldn't for ANY script). I haven't seen anything from other 
>>scholars who consider it a 46th sign.
>
>This is an insufficient reason for not coding a symbol for 
>unidentified character, since it is unidentified. U+FFFD could be 
>pressed into service, but would be awkward if definite agreement on 
>identification is reached later, as it can be used for any 
>unidentified character, not just Phaistos.

Sorry, this symbol is usually represented by a hatched pattern 
showing that something is missing. Godart uses [.] in his 
transcription. Since it is possible that sign 3, 20, 8, or 4 was 
actually there before the identifying clay broke off, it would be 
inappropriate to invent something new to represent the missing 
character. Say that we found another Phaistos document with the same 
string in it, and were able to decipher Phaistos, and found that the 
string matched in meaning and syntax to what's on the disk. Then we 
would have a superfluous character encoded.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




RE: Phaistos in ConScript

2002-07-08 Thread Michael Everson

Ken. Thanks for your response.

>  > Now let us say I wish to represent this text LTR, as I do. Well if I
>>  reverse the presentation order without I get PLUMED-HEAD SHIELD CLUB
>>  PEDESTRIAN BOOMERANG -- but if I don't reverse the glyphs, than
>>  plumed-head is still facing to the right, as is the boomerang -- how
>>  am I to know that the directionality is LTR?
>
>Because then it will say:
>
>GNAREMOOB NAIRTSEDEP BULC DLEIHS DAEH-DEMULP

As I said, the original might (assuming a syllabic structure and 
assigning random syllable values) well be LABUGIDANO, but when 
reversed it might read NODAGIBULA which could be a valid linguistic 
sequence. OK, so reading the whole text you would come up with 
readings which wouldn't make sense, so you would have to start over 
with a different directionality. Given the practice of the other 
scripts in the region, I consider this unlikely given its 
impracticality. The people who used scripts with multiple 
directionalities did reverse the glyphs when reversing the 
directionality. The inherent directionality of Phoenician BETH or of 
PLUMED-HEAD or of Egyptian WN (the bunny rabbit) lends itself to the 
use of such glyph-indicated directionality for text in general. I 
would not assume, additionally, that the Phaistos script would always 
be written on disks in spiral formatting. That too would be unlikely 
and impractical, would it not?

>  > I can't. I will start reading with the boomerang.
>
>What's the matter -- can't you read and write Phaistos correctly?

Hmpf.

>  > That Godart did not make this correction in his book when he used LTR
>  > directionality was an error. I'm sticking by the decision I made when
>  > I made my fonts, because it is more likely to be right than not.
>
>I think you may be sticking your neck out rather far (to the left) 
>on this one. I am inclined to agree with Marco about the issue for 
>presentation. Why should you innovate over Godart here in this 
>*particular* instance, based on so little evidence.

Because I suspect that Godart might well agree with me -- I don't 
imagine that he ever considered this aspect of text presentation. And 
because it makes sense given the context of other scripts in the 
region.

>You could be right, but then you could be wrong, too.

So could Godart! He was describing the disk, not thinking about 
encoding and presenting it!

>  > There aren't any other scripts in the area which change 
>directionality without
>  > reversing the glyphs, and Phaistos certainly isn't Chinese.
>
>Well that much I agree 100% with.

My point being that though Beijing and Hong Kong newspaper headlines 
might present LTR or RTL directionality without mirroring, this 
practice is rare or indeed unknown in Europe at 1700 BCE.

Well that's my opinion anyway. I suppose we could try to contact 
Godart and ask his opinion. It's not as though the CSUR is 
normative
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Whats the difference between a composite and a combining sequence?

2002-07-08 Thread Tex Texin

That is also consistent with the glossary definitions:
http://www.unicode.org/glossary.
tex

Kenneth Whistler wrote:
> 
> Theodore,
> 
> > http://www.unicode.org/unicode/reports/tr15/ mentions both
> > composites and combining sequences.
> >
> > But it doesn't tell us the difference. I know what a combining
> > sequence is. If I didn't know what a composite was, I'd guess it
> > was the same thing as a combining sequence.
> 
> See TUS 3.0, Chapter 3, pp. 43-44
> 
> D17 Combining character sequence: a character sequence consisting of
> either a base character followed by a sequence of one or more
> combining characters, or a sequence of one or more combining
> characters.
> 
> [e.g. A + combining-grave  ]
> 
> D18 Decomposable character: a character that is equivalent to a sequence
> of one or more other characters, according to the decomposition
> mappings found in the names list... It may also be known as a
> precomposed character or composite character.
> 
> [e.g. A-grave, U+00C0]
> 
> --Ken

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: Whats the difference between a composite and a combining sequence?

2002-07-08 Thread Kenneth Whistler

Theodore,

> http://www.unicode.org/unicode/reports/tr15/ mentions both 
> composites and combining sequences.
> 
> But it doesn't tell us the difference. I know what a combining 
> sequence is. If I didn't know what a composite was, I'd guess it 
> was the same thing as a combining sequence.

See TUS 3.0, Chapter 3, pp. 43-44

D17 Combining character sequence: a character sequence consisting of
either a base character followed by a sequence of one or more
combining characters, or a sequence of one or more combining
characters.

[e.g. A + combining-grave  ]

D18 Decomposable character: a character that is equivalent to a sequence
of one or more other characters, according to the decomposition
mappings found in the names list... It may also be known as a
precomposed character or composite character.

[e.g. A-grave, U+00C0]

--Ken




Whats the difference between a composite and a combining sequence?

2002-07-08 Thread Theodore H. Smith

http://www.unicode.org/unicode/reports/tr15/ mentions both 
composites and combining sequences.

But it doesn't tell us the difference. I know what a combining 
sequence is. If I didn't know what a composite was, I'd guess it 
was the same thing as a combining sequence.

However, the two are meant to be different, so it can't be the same.

If I am getting the Unicode terminology correct, a combining 
sequence is like a plain ASCII letter A, with the accent 
following.





Acrobat question

2002-07-08 Thread Michael Everson

A bit off topic, but if you have a PDF file without page numbers on 
the original pages, is it possible to add them so that when it prints 
the page numbers appear?
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Multiple encodings for 1 character

2002-07-08 Thread Theodore H. Smith

>> For example, for filenames, OSX will encode an accented Roman
>> letter one way, while for filenames Windows will encode it the
>> other way. These kind of confusions are totally expected, if
>> Unicode will allow more than one way to encode the same
>> character.
>
> Perhaps a stray newsfeed routed via Alpha Centauri?
> This is *very* old news, indeed.

I'm new to this, though.

>> This means that matching algorithm's won't work, because the
>> characters are different!
>>
>> Will there be some kind of recommendation of which to avoid?
>> Will the Unicode consortium make a standard to say that one of
>> these encodings is strongly not recommended, and in fact
>> depreciated?
>
> UAX #15: Unicode Normalization Forms
>
> http://www.unicode.org/unicode/reports/tr15/

Thanks.

> And it is up to an implementation to specify which normalization
> form it uses.
>
> By the way, we don't depreciate Unicode encodings -- we appreciate
> them. ;-)

Thats a shame. Simplicity is wonderful.

--
 Theodore H. Smith - Macintosh Consultant / Contractor.
 My website: 





RE: Phaistos in ConScript

2002-07-08 Thread Asmus Freytag

At 02:43 PM 7/8/02 +0100, Michael Everson wrote:
>Godart says "The last sign of set A:VIII was not deleted but broke off 
>with a sliver of clay. Bearing mind the space and outline of the gap, 
>which seems to roughtly follow the outline of the broken sign, it seems 
>that the most plausible identification of the mysterious sign is a 3 
>[TATTOOED HEAD] or a 20 [DOLIUM], unless it is an 8 [GAUNTLET] or a 4 
>[CAPTIVE], which is less likely." I don't want to encode a new character 
>without better evidence (and wouldn't for ANY script). I haven't seen 
>anything from other scholars who consider it a 46th sign.


This is an insufficient reason for not coding a symbol for unidentified 
character, since it is unidentified. U+FFFD could be pressed into service, 
but would be awkward if definite agreement on identification is reached 
later, as it can be used for any unidentified character, not just Phaistos.






Re: Multiple encodings for 1 character

2002-07-08 Thread Kenneth Whistler

Theodore wrote:

> What is going to be done about the confusion generated from 
> having multiple ways to encode the same character?
> 
> For example, for filenames, OSX will encode an accented Roman 
> letter one way, while for filenames Windows will encode it the 
> other way. These kind of confusions are totally expected, if 
> Unicode will allow more than one way to encode the same 
> character.

Perhaps a stray newsfeed routed via Alpha Centauri?
This is *very* old news, indeed.

> 
> This means that matching algorithm's won't work, because the 
> characters are different!
> 
> Will there be some kind of recommendation of which to avoid? 
> Will the Unicode consortium make a standard to say that one of 
> these encodings is strongly not recommended, and in fact 
> depreciated?

UAX #15: Unicode Normalization Forms

http://www.unicode.org/unicode/reports/tr15/

And it is up to an implementation to specify which normalization
form it uses.

By the way, we don't depreciate Unicode encodings -- we appreciate
them. ;-)

> And what about the OS that uses this encoding? How will the 
> Unicode consortium make the newly-offending OS change it's ways?

It isn't offending, and the Unicode Consortium won't.

--Ken




Re: Multiple encodings for 1 character

2002-07-08 Thread David Possin

You will have to normalize the way the strings are processed, and you
need to make sure it is done the same way everytime. Checkout ICU for
this purpose. 

http://oss.software.ibm.com/icu/

Dave
--- "Theodore H. Smith" <[EMAIL PROTECTED]> wrote:
> What is going to be done about the confusion generated from 
> having multiple ways to encode the same character?
> 
> For example, for filenames, OSX will encode an accented Roman 
> letter one way, while for filenames Windows will encode it the 
> other way. These kind of confusions are totally expected, if 
> Unicode will allow more than one way to encode the same 
> character.
> 
> This means that matching algorithm's won't work, because the 
> characters are different!
> 
> Will there be some kind of recommendation of which to avoid? 
> Will the Unicode consortium make a standard to say that one of 
> these encodings is strongly not recommended, and in fact 
> depreciated?
> 
> And what about the OS that uses this encoding? How will the 
> Unicode consortium make the newly-offending OS change it's ways?
> 
> And what about the hordes of apps that expect one format but 
> don't expect the other? And the hoardes of OS independant apps 
> (Java? Perl?) that might generate conflicting versions?
> 
> 


=
Dave Possin
Globalization Consultant
www.Welocalize.com
http://groups.yahoo.com/group/locales/

__
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com




RE: Phaistos in ConScript

2002-07-08 Thread Michael Everson

You guys are not thinking things through. Firstly the fact that the 
only document we have was made with stamps rather than drawn by hand 
means nothing. Chinese can be written with a brush, a pen, a chisel, 
or it can be impressed into wax with a seal.

You have to look at the structure of the script and think of legibility.

Firstly, most of the glyphs are strongly directional. Let us assume 
that we have a string of text PLUMED-HEAD SHIELD CLUB PEDESTRIAN 
BOOMERANG (that's as encoded in the backing store). The script shows 
RTL directionality, and when reading it we read into the face of the 
PLUMED-HEAD. SHIELD and CLUB are symmetrical, but PEDESTRIAN and 
BOOMERANG are not.

The characters display as BOOMERANG PEDESTRIAN CLUB SHIELD 
PLUMED-HEAD, where plumed-head faces right and the boomerang points 
right as well. We read RTL.

Now let us say I wish to represent this text LTR, as I do. Well if I 
reverse the presentation order without I get PLUMED-HEAD SHIELD CLUB 
PEDESTRIAN BOOMERANG -- but if I don't reverse the glyphs, than 
plumed-head is still facing to the right, as is the boomerang -- how 
am I to know that the directionality is LTR? I can't. I will start 
reading with the boomerang.

Let's pretend we knew the syllabic values of these characters. 
PLUMED-HEAD is LA, SHIELD is BU, CLUB is GI, PEDESTRIAN is DA, 
BOOMERANG is NO. The correct reading must be LABUGIDANO, but if you 
reverse RTL directionality to LTR directionality without reversing 
the glyphs, you won't know that the directionality is changed, and 
you will be tempted to read NODAGIBULA. And what if that was a valid 
sequence in your language?

That Godart did not make this correction in his book when he used LTR 
directionality was an error. I'm sticking by the decision I made when 
I made my fonts, because it is more likely to be right than not. 
There aren't any other scripts in the area which change 
directionality without reversing the glyphs, and Phaistos certainly 
isn't Chinese.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Multiple encodings for 1 character

2002-07-08 Thread Michael Everson

Theodore: Search the Unicode site for "normalization".
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Multiple encodings for 1 character

2002-07-08 Thread Theodore H. Smith

What is going to be done about the confusion generated from 
having multiple ways to encode the same character?

For example, for filenames, OSX will encode an accented Roman 
letter one way, while for filenames Windows will encode it the 
other way. These kind of confusions are totally expected, if 
Unicode will allow more than one way to encode the same 
character.

This means that matching algorithm's won't work, because the 
characters are different!

Will there be some kind of recommendation of which to avoid? 
Will the Unicode consortium make a standard to say that one of 
these encodings is strongly not recommended, and in fact 
depreciated?

And what about the OS that uses this encoding? How will the 
Unicode consortium make the newly-offending OS change it's ways?

And what about the hordes of apps that expect one format but 
don't expect the other? And the hoardes of OS independant apps 
(Java? Perl?) that might generate conflicting versions?





RE: Phaistos in ConScript

2002-07-08 Thread Timothy Partridge

Marco recently said:

> > >5. I find that mirroring the signs as you did in your font is an
> > >unhistorical. The whole corpus is right-to-left, and the 
> > fact that the signs
> > >where impressed with types makes it impossible that the 
> > signs could have
> > >been reversed. In academic books, it is common practice to 
> > type the disc's
> > >text left-to-right, but the signs are not reversed.
> > [Michael]
> > I have followed Egyptological -- and ancient Egyptian -- practice 
> > here. If the script is represented right-to-left the faces point to 
> > the right so that you read into their faces. If the script direction 
> > is reversed so that it is left-to-right, it is conventional -- among 
> > Egyptologists and ancient Egyptians -- to reverse the signs as well. 
>
> I see. But Hieroglyphs were handwritten, not "typed". Moreover, the
> mirroring of glyphs is actually attested for Egyptian.
>
> > Godart does not reverse the glyphs even though he reverses the 
> > directionality, but I think it is *his* practice which is 
> > ahistorical, and I think it makes the text harder to read. And I 
> > suspect is has to do with the font technology he had in 1994 when he 
> > wrote his book.
>
> It's seems that July 2002 is our disagreement month... I think that Godart
> was perfectly right avoiding assumptions that he could not support: there is
> no reason to think that the Phaistos "script" should work as Egyptian
> hieroglyphs work.

I would support you in this. Michael says that all the scripts in the region
go both ways, but we don't even know that the disk is from the region. (And the
headdresses apparently don't look local.) It might have come some way in trade.

I feel tempted to protest that the characters aren't in the right order, but
someone might take me up on that :-) I'm probably right though!

[The reason I haven't replied directly to Michael's message is that
something about his messages crashes my mail reader when I try it. Apologies
to everyone for accidently including a load of message headers last time I
tried a workaround.]

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: Saying characters out loud (derives from hash, pound, octothorpe?)

2002-07-08 Thread Timothy Partridge

William Overington recently said:

> Still no olde worlde shoppe name with a yogh in though yet?  :-)

Why bother with an old one when there is a current shop with a yogh? Do you
have a newsagent called Menzies in your part of England? (They have spread
from Scotland.) That isn't a zed (or zee) in the name; it's a yogh.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





RE: Phaistos in ConScript

2002-07-08 Thread Marco Cimarosti

Michael Everson wrote:
> How much more imprudent is it to encode it as a unique character when 
> nothing is known about it? :-)

:-)

> >E.g. would you dare to unify it with U+0316 (COMBINING GRAVE 
> ACCENT BELOW)
> >without knowing whether it is a stress mark, a tone mark, a 
> cantillation
> >mark, a vowel muter, a full stop, a comma, a determinative for
> >logographs...?
> 
> I ask again:
> 
> >  > Do you have an analysis of all the signs which take it 
> in the document?
> >
> >Yes, in Louis Godart, "Il disco di Festo: l'enigma di una scrittura",
> >Einaudi (Italy) 1994, ISBN 8806128922. An English 
> translation should now be
> >available.
> 
> OK, I have the English translation of it. But you want the character. 
> You do the work. Please look and tell me by cell number and character 
> (A-I-22, A-IV-1, B-VI-45) where they are actually applied. Be 
> comprehensive. Thanks.

It will be a delightful activity for my vacations. (But I know what my wife
will say: "Aren't you bringing *that* book with you again also this
vacations, are you?")

> >  > I agree that those names aren't good. The dotted one 
> occurs at the
> >>  beginning of the text on both sides. PHAISTOS BEGINNING 
> OF TEXT and
> >>  PHAISTOS SEPARATOR then?
> >
> >Still assumptions, but much more reasonable.
> 
> The one does begin the text on both sides, and the other does 
> separate.

I was just implying that nothing more than "reasonable" can be said about
character names for an unknown script. Nobody can honestly say they are
"correct" or "incorrect".

Imagine that these last two paragraphs were the only remains of English, it
would be perfectly reasonable to chose the name ENGLISH BEGINNING OF TEXT
for uppercase "I"...

> >  > I have followed Egyptological -- and ancient Egyptian -- practice
> >>  here. If the script is represented right-to-left the 
> faces point to
> >>  the right so that you read into their faces. If the 
> script direction
> >>  is reversed so that it is left-to-right, it is 
> conventional -- among
> >>  Egyptologists and ancient Egyptians -- to reverse the 
> signs as well.
> >
> >I see. But Hieroglyphs were handwritten, not "typed".
> 
> And carved in stone and wood. Impressed in soft clay 
> probably.

Probably? Never heard such a thing, apart seals. BTW, Egyptian would have
required a big set of punches, and it would have posed complex kerning
issues.

> Your point?

Handwriting (or hand carving) a mirrored version of a sign has no additional
costs. Impressing a mirrored version of a sign means casting two (golden?)
sets of punches.

However, if you faithfully copy the glyphs seen on the disc, you cannot be
wrong. If you don't, you can be right or wrong, depending on chance.

> >Moreover, the mirroring of glyphs is actually attested for Egyptian.
> 
> Yeah because you have thousands of documents. Mirroring is also 
> attested in Greek and Etruscan. I don't think I've erred in thinking 
> that it would apply to Phaistos in left-to-right directionality.

The signs of Egyptian, Greek and Etruscan were all handwritten; those of
Ph.D. weren't. Anyway, we know that Egyptian, Greek and Etruscan allowed
mirroring; for Ph.D. we simply don't know.

> >  > Godart does not reverse the glyphs even though he reverses the
> >>  directionality, but I think it is *his* practice which is
> >>  ahistorical, and I think it makes the text harder to read. And I
> >>  suspect is has to do with the font technology he had in 
> 1994 when he
> >>  wrote his book.
> >
> >It's seems that July 2002 is our disagreement month... I 
> think that Godart
> >was perfectly right avoiding assumptions that he could not 
> support: there is
> >no reason to think that the Phaistos "script" should work as Egyptian
> >hieroglyphs work.
> 
> No way! *ALL* of the scripts of that part of the world show mirroring 
> of characters when the script direction is reversed. There's no 
> reason to assume that Phaistos would be otherwise.

There are three very good reasons:

1) See the above about costs and planning ahead.

2) AFAIK, it is not true that *all* other scripts in the Mediterranean had
mirroring. Particularly I never heard this for Linear A, Linear B and
Cyprian, which are the most likely relatives of Ph.D.

3) Anyway, we don't know for sure which "part of the world" the Phaistos
Disc is from.

_ Marco




Re: Saying characters out loud (derives from hash, pound,octothorpe?)

2002-07-08 Thread Tex Texin

I have heard:
squiqqle for tilde
bang for exclamation mark
hook for question mark.
tex

Barry Caplan wrote:
> 
> At 11:37 AM 7/5/2002 +0100, Michael Everson wrote:
> >>Also, how does one say the U+007E character out loud while reading out the
> >>address of a web page?
> >
> >"Tilde". Get real, William.
> 
> FF5E is colloquially known as a "wave" in Japanese, IIRC, and hence 007E is a "small 
>wave" or "half width wave".
> 
> Barry Caplan
> www.i18n.com

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: Saying characters out loud (derives from hash, pound, octothorpe?)

2002-07-08 Thread Barry Caplan

At 11:37 AM 7/5/2002 +0100, Michael Everson wrote:
>>Also, how does one say the U+007E character out loud while reading out the
>>address of a web page?
>
>"Tilde". Get real, William.


FF5E is colloquially known as a "wave" in Japanese, IIRC, and hence 007E is a "small 
wave" or "half width wave".

Barry Caplan
www.i18n.com





RE: Phaistos in ConScript

2002-07-08 Thread Michael Everson

At 17:40 +0200 2002-07-08, Marco Cimarosti wrote:
>Michael Everson wrote:
>>  >1. Your lacks an important sign, which I would call "PHAISTOS
>>  >COMBINING LINE BELOW". [...]
>>
>>  Um, can't something from General Punctuation be used, in the absence
>>  of knowing more about this "character"?
>
>It seems very imprudent, considering that nothing is known abut the nature
>of that a sign.

How much more imprudent is it to encode it as a unique character when 
nothing is known about it? :-)

>E.g. would you dare to unify it with U+0316 (COMBINING GRAVE ACCENT BELOW)
>without knowing whether it is a stress mark, a tone mark, a cantillation
>mark, a vowel muter, a full stop, a comma, a determinative for
>logographs...?

I ask again:

>  > Do you have an analysis of all the signs which take it in the document?
>
>Yes, in Louis Godart, "Il disco di Festo: l'enigma di una scrittura",
>Einaudi (Italy) 1994, ISBN 8806128922. An English translation should now be
>available.

OK, I have the English translation of it. But you want the character. 
You do the work. Please look and tell me by cell number and character 
(A-I-22, A-IV-1, B-VI-45) where they are actually applied. Be 
comprehensive. Thanks.

>BTW, the only thing I disliked in this excellent book was the fact that,
>IMHO, Godart was to quick to accept the assumption that this sign could be
>punctuation, and he even uses it to segment the text in "sentences" or
>"veses".

What page or section does he state that specifically?

>Perhaps, it would be useful to have a (non PUA) Unicode symbol to mark
>unidentified characters in any kind of paleographic or critic texts. This
>could be the object of a proposal, or it could be unified with one of the
>existing shaded rectangles.

Markup. In my file I just wrote [.] as Godart did. But for Egypian 
and Cuneiform it's been suggested that markup is the appropriate 
means for showing this element of palaeography.

>  > I agree that those names aren't good. The dotted one occurs at the
>>  beginning of the text on both sides. PHAISTOS BEGINNING OF TEXT and
>>  PHAISTOS SEPARATOR then?
>
>Still assumptions, but much more reasonable.

The one does begin the text on both sides, and the other does separate.

>  > I don't like VERTICAL LINE and DOTTED
>>  VERTICAL LINE very much. That kind of description we usually reserve
>>  for abstract technical symbols rather than punctuation.
>
>Punctuation? Did you discover it is punctuation? :-)

Separators are punctuation. What else? Perhaps it is a 17th-century 
BCE spreadsheet.

>OTOH, you know the Phaistos Disk "translators": for many of them, the
>character names on your CSUR page make enough evidence that PHAISTOS SIGN OX
>BACK was pronounced /bu/. (or even /kau as/ :-)

There are silly people everywhere.

>  > I have followed Egyptological -- and ancient Egyptian -- practice
>>  here. If the script is represented right-to-left the faces point to
>>  the right so that you read into their faces. If the script direction
>>  is reversed so that it is left-to-right, it is conventional -- among
>>  Egyptologists and ancient Egyptians -- to reverse the signs as well.
>
>I see. But Hieroglyphs were handwritten, not "typed".

And carved in stone and wood. Impressed in soft clay probably. Your point?

>Moreover, the mirroring of glyphs is actually attested for Egyptian.

Yeah because you have thousands of documents. Mirroring is also 
attested in Greek and Etruscan. I don't think I've erred in thinking 
that it would apply to Phaistos in left-to-right directionality.

>  > Godart does not reverse the glyphs even though he reverses the
>>  directionality, but I think it is *his* practice which is
>>  ahistorical, and I think it makes the text harder to read. And I
>>  suspect is has to do with the font technology he had in 1994 when he
>>  wrote his book.
>
>It's seems that July 2002 is our disagreement month... I think that Godart
>was perfectly right avoiding assumptions that he could not support: there is
>no reason to think that the Phaistos "script" should work as Egyptian
>hieroglyphs work.

No way! *ALL* of the scripts of that part of the world show mirroring 
of characters when the script direction is reversed. There's no 
reason to assume that Phaistos would be otherwise.

>I don't think font technology had anything to do with this choice: from my
>printed edition of "Il disco di Festo" I can see clearly that the text was
>reproduced using little images, not a font (sometimes the borders of the
>film and the adhesive tape are still visible).

Right, so then he had a sheet of drawings photocopied dozens of times 
and pasted them down. He didn't think of directionality in the way we 
do I guess.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Chromatic text. (follows from Re: [unicode] Re: FW:Inappropriate Proposals FAQ)

2002-07-08 Thread Michael Everson

At 15:19 +0100 2002-07-08, William Overington wrote:

>Actually I was trying in the posting upon which you comment to suggest that,
>even if people do not agree with me about having colour codes in a plain
>text file, they might perhaps consider as a separate issue the adding into
>regular Unicode of a zero width operator whose use would be to indicate that
>a character, such as U+1362, should be decorated chromatically.

no no No No NO. Characters are not distinguished by colour, 
unconfirmed statements about Aztec notwithstanding.

>This would mean that a sequence U+1362 ZWJ ZWCDO could be used in 
>documents, which would give a chromatically decorated glyph with a 
>chromatic font yet would just give U+1362 as a monochrome character 
>if the font did not recognize the U+1362 ZWJ ZWCDO sequence.

This is NOT ligation, and it is NOT what the ZWJ is for, and it is 
NOT an appropriate extension of the

>My opinion is that splitting text files into just two categories, either
>plain text or markup is not sufficient, but that there should perhaps be
>more categories or, if there are but two categories that the dividing line
>between them should be in a different place.

Ten billion documents on the internet and the entire course of modern 
text processing indicate that it is unwise to hold the opinion that 
you do. Why don't you simply admit that you have been barking up the 
wrong tree and initiate more useful work? We have been about as civil 
as you can expect, though I am sure you have noticed that my own 
patience with this silliness is about at an end.

>I tend to base the essential dividing line upon whether the encoding 
>of the file of code points is meaningful if one tries to compute the 
>effect of a code point upon the system as simply the effect of that 
>code point as it stands, without having to have software recognize a 
>character such as < and determine that a markup bubble is being 
>entered then to have to read in several more characters within the 
>markup bubble before taking any action as a result of the first 
>character in the sequence (that is, the < character) being read.

Well get over it. You have seriously misconstrued the difference 
between "plain text" and "rich text". Both have been in use for many, 
many years and no one has had much trouble with it. Wondering 
"whether the encoding of the file of code points is meaningful" is 
not going to gain you very much in this line of misreasoning.

>That distinction means that each Unicode character is processed as 
>it is received within the main loop of the program, without the 
>receiving of a < character putting the processing into an inner loop 
>within a markup bubble, within which bubble ordinary Unicode 
>character codes which are read have a
>different meaning than in the Unicode specification.

Processing of characters happens at many different levels. All I can 
say is that it is clear that you do not know what you are talking 
about.

>To me, such a distinction means that people who are using lower cost, more
>generally available software packages, might by such an approach be able in
>the not too distant future to use files in a non-proprietary portable format
>and get much better results than just using monochrome traditional plain
>text.

Balderdash. In the first place, those imaginary people are not 
expressing a user need for your pseudo-solutions. Real people use 
markup to colour their texts, and have been since the first colour 
monitors were introduced and MacWrite made it possible. What was 
that, 15 years ago?

>Perhaps some sort of consensus over nomenclature for three categories of
>text file could occur, namely plain text in the manner which you like it,
>plain text in the manner in which I like it and markup.  Maybe plain text,
>enhanced text and markup would be suitable names.  How do people feel about
>that please?

I feel ill.

>It is unfortunately the case in discussions that when someone disagrees with
>an idea that is put forward that he or she is more likely to respond in
>public than if he or she agrees with an idea which is put forward, or has
>simply read about the idea and just notes it as an interesting possibility.
>This can have the effect that many people may agree with an idea or at least
>not be against it yet make no comment, perhaps giving an impression that an
>idea is not well received at large when in fact that is not necessarily the
>case.

Don't fool yourself. Your "plain text in the manner in which you like 
it" is a lementable abuse of character codes to effect the same 
results which real markup of various kinds has been able to do for 
decades. I assure you, the ranks of this list are not filled with 
people agreeing with you.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




RE: Chromatic text. (follows from Re: [unicode] Re: FW: Inappropriate Proposals FAQ)

2002-07-08 Thread Marco Cimarosti

William Overington wrote:
> Actually I was trying in the posting upon which you comment 
> to suggest that, even if people do not agree with me about
> having colour codes in a plain text file, they might
> perhaps consider as a separate issue the adding into regular
> Unicode of a zero width operator whose use would be 
> to indicate that a character, such as U+1362, should be
> decorated chromatically.

Come on, William!!

Adding such a "zero width operator" *is* having color in plain text!

And adding such "zero width operators" *is* inserting mark up in plain text!

> >I interpret your post as one more lengthy repetition of your 
> well-known
> >opinion: differences between "plain text" and "rich text" 
> should not exist:
> >they should be eliminated by incorporating the mark-up in 
> the encoding.
> 
> Actually, that is not my opinion.

No, I know. This is my explanation of my perception of your explanation of
your opinion. Now I am not sure what your perception of my explanation of my
perception of your explanation of your opinion might be.

Gentlemen, communication is such a difficult art!

> [...]
> Perhaps some sort of consensus over nomenclature for three 
> categories of
> text file could occur, namely plain text in the manner which 
> you like it,
> plain text in the manner in which I like it and markup.  
> Maybe plain text,
> enhanced text and markup would be suitable names.  How do 
> people feel about
> that please?

I would suggest "proletarian text", "middle-class text" and "capitalist
text", if I wasn't so scared that someone could take it seriously.

> It is unfortunately the case in discussions that when someone 
> disagrees with
> an idea that is put forward that he or she is more likely to 
> respond in
> public than if he or she agrees with an idea which is put 
> forward, or has
> simply read about the idea and just notes it as an 
> interesting possibility.
> This can have the effect that many people may agree with an 
> idea or at least
> not be against it yet make no comment, perhaps giving an 
> impression that an
> idea is not well received at large when in fact that is not 
> necessarily the
> case.

Yes, definitely a difficult art.

_ Marco




RE: Phaistos in ConScript

2002-07-08 Thread Marco Cimarosti

Michael Everson wrote:
> >1. Your lacks an important sign, which I would call "PHAISTOS
> >COMBINING LINE BELOW". [...]
> 
> Um, can't something from General Punctuation be used, in the absence 
> of knowing more about this "character"?

It seems very imprudent, considering that nothing is known abut the nature
of that a sign.

E.g. would you dare to unify it with U+0316 (COMBINING GRAVE ACCENT BELOW)
without knowing whether it is a stress mark, a tone mark, a cantillation
mark, a vowel muter, a full stop, a comma, a determinative for
logographs...?

> Do you have an analysis of 
> all the signs which take it in the document?

Yes, in Louis Godart, "Il disco di Festo: l'enigma di una scrittura",
Einaudi (Italy) 1994, ISBN 8806128922. An English translation should now be
available.

BTW, the only thing I disliked in this excellent book was the fact that,
IMHO, Godart was to quick to accept the assumption that this sign could be
punctuation, and he even uses it to segment the text in "sentences" or
"veses".

Apart this detail, Godart made an excellent work in delivering all the known
facts and rejecting all fantasy and indemonstrable assumptions.

> >2. The last sign of the tenth group ("word"?) is almost totally
> >lost, due to a crack. However, it seems than none of the 45 
> known signs may
> >fit in the gap. Many scholars consider this to be a 46th 
> sign. The glyph
> >normally used is the literature is a texture of diagonal lines.
> 
> Godart says "The last sign of set A:VIII was not deleted but broke 
> off with a sliver of clay. Bearing mind the space and outline of the 
> gap, which seems to roughtly follow the outline of the broken sign, 
> it seems that the most plausible identification of the mysterious 
> sign is a 3 [TATTOOED HEAD] or a 20 [DOLIUM], unless it is an 8 
> [GAUNTLET] or a 4 [CAPTIVE], which is less likely." I don't want to 
> encode a new character without better evidence (and wouldn't for ANY 
> script). I haven't seen anything from other scholars who consider it 
> a 46th sign.

Godart himself allows for this possibility in the book I mentioned above.
But you are right, encoding this "phantom" characters would be a problem in
case the missing character is identified.

Perhaps, it would be useful to have a (non PUA) Unicode symbol to mark
unidentified characters in any kind of paleographic or critic texts. This
could be the object of a proposal, or it could be unified with one of the
existing shaded rectangles.

> >... about the character names:
> >
> >3. The names for E6FE and E6FF ("PHAISTOS PARAGRAPH SEPARATOR" and 
> >"PHAISTOS PHRASE SEPARATOR") show imprudent assumptions. E.g., many 
> >people consider E6FF to be a paragraph or text separator, and E6FE 
> >to be a word separator. It would be more prudent to use a more 
> >generic wording, e.g. "PHAISTOS VERTICAL LINE" and "PHAISTOS 
> >VERTICAL DOTTED LINE".
> 
> I agree that those names aren't good. The dotted one occurs at the 
> beginning of the text on both sides. PHAISTOS BEGINNING OF TEXT and 
> PHAISTOS SEPARATOR then?

Still assumptions, but much more reasonable.

> I don't like VERTICAL LINE and DOTTED 
> VERTICAL LINE very much. That kind of description we usually reserve 
> for abstract technical symbols rather than punctuation.

Punctuation? Did you discover it is punctuation? :-)

> >4. Names such as "pedestrian", "plumed head", ... "wavy band" are
> >just nicknames used by scholars, as opposed to accepted 
> identifications of
> >the objects represented. It may be worth to emphasize this 
> in the character
> >names: e.g., "PHAISTOS SIGN KNOWN AS PEDESTRIAN".
> 
> We either use the numbers given by the scholars, so U+E6D0 can either 
> be called PHAISTOS SIGN-01 or PHAISTOS SIGN PEDESTRIAN. Either way 
> we're using a scholarly designation. The meaningful nicknames are 
> more fun than the numeric ones

"PHAISTOS SIGN-01" would be too meaningless. I still feel ashamed for my
stupid idea that Unicode Kang Xi radicals should have been called "KANG XI
RADICAL 1" .. "KANG XI RADICAL 214".

OTOH, you know the Phaistos Disk "translators": for many of them, the
character names on your CSUR page make enough evidence that PHAISTOS SIGN OX
BACK was pronounced /bu/. (or even /kau as/ :-)

> >... and about the Everson Phaistos font:
> >
> >5. I find that mirroring the signs as you did in your font is an
> >unhistorical. The whole corpus is right-to-left, and the 
> fact that the signs
> >where impressed with types makes it impossible that the 
> signs could have
> >been reversed. In academic books, it is common practice to 
> type the disc's
> >text left-to-right, but the signs are not reversed.
> 
> I have followed Egyptological -- and ancient Egyptian -- practice 
> here. If the script is represented right-to-left the faces point to 
> the right so that you read into their faces. If the script direction 
> is reversed so that it is left-to-right, it is conventional -- among 
> Egyptologists and ancient Egyptians --

Re: Chromatic text, ligatures and Fraktur ligatures.

2002-07-08 Thread Doug Ewell

I know I said this before, but this time I'm serious.

I will no longer respond publicly to any post concerning William
Overington's proposed extensions of the kind of things that should be
encoded in Unicode.  That is because I am convinced now that his
misinterpretation of the basic principles of Unicode, and the types of
entities that do and do not make sense for encoding, is willful and not
due to ignorance.

Nobody with the intelligence of a tree could possibly read the
character-glyph document and come away with the impression that font
styles, sizes, colors, etc. are "central" to the notion of what belongs
in character encoding.  Intelligence is clearly not the problem here.

But, because I am not an ad hominem kind of guy, I will be happy to
discuss other topics related to (and appropriate to) Unicode that are
raised by William or anyone else.  In my next message, I want to address
the "large corporate sponsor" angle that William, and others in the
past, have used to argue that Unicode is unresponsive to the needs of
low-end users.

-Doug Ewell
 Fullerton, California





Re: Chromatic text. (follows from Re: [unicode] Re: FW: Inappropriate Proposals FAQ)

2002-07-08 Thread William Overington

Marco Cimarosti wrote as follows.

>Of course you can. But my feeling is that you already *did* suggest this,
>many and many times.

Actually I was trying in the posting upon which you comment to suggest that,
even if people do not agree with me about having colour codes in a plain
text file, they might perhaps consider as a separate issue the adding into
regular Unicode of a zero width operator whose use would be to indicate that
a character, such as U+1362, should be decorated chromatically.  This would
mean that a sequence U+1362 ZWJ ZWCDO could be used in documents, which
would give a chromatically decorated glyph with a chromatic font yet would
just give U+1362 as a monochrome character if the font did not recognize the
U+1362 ZWJ ZWCDO sequence.

>
>I interpret your post as one more lengthy repetition of your well-known
>opinion: differences between "plain text" and "rich text" should not exist:
>they should be eliminated by incorporating the mark-up in the encoding.
>

Actually, that is not my opinion.

My opinion is that splitting text files into just two categories, either
plain text or markup is not sufficient, but that there should perhaps be
more categories or, if there are but two categories that the dividing line
between them should be in a different place.  I tend to base the essential
dividing line upon whether the encoding of the file of code points is
meaningful if one tries to compute the effect of a code point upon the
system as simply the effect of that code point as it stands, without having
to have software recognize a character such as < and determine that a markup
bubble is being entered then to have to read in several more characters
within the markup bubble before taking any action as a result of the first
character in the sequence (that is, the < character) being read.  That
distinction means that each Unicode character is processed as it is received
within the main loop of the program, without the receiving of a < character
putting the processing into an inner loop within a markup bubble, within
which bubble ordinary Unicode character codes which are read have a
different meaning than in the Unicode specification.

To me, such a distinction means that people who are using lower cost, more
generally available software packages, might by such an approach be able in
the not too distant future to use files in a non-proprietary portable format
and get much better results than just using monochrome traditional plain
text.

Perhaps some sort of consensus over nomenclature for three categories of
text file could occur, namely plain text in the manner which you like it,
plain text in the manner in which I like it and markup.  Maybe plain text,
enhanced text and markup would be suitable names.  How do people feel about
that please?

It is unfortunately the case in discussions that when someone disagrees with
an idea that is put forward that he or she is more likely to respond in
public than if he or she agrees with an idea which is put forward, or has
simply read about the idea and just notes it as an interesting possibility.
This can have the effect that many people may agree with an idea or at least
not be against it yet make no comment, perhaps giving an impression that an
idea is not well received at large when in fact that is not necessarily the
case.

William Overington

8 July 2002











RE: Phaistos in ConScript

2002-07-08 Thread Michael Everson

At 14:11 +0200 2002-07-08, Marco Cimarosti wrote:
>Michael Everson wrote:
>>  A Unicode-enabled font based on the ConScript encoding and a test
>>  page containing the entire Phaistos corpus can be found at
>>  http://www.evertype.com/standards/csur/phaistos-sample.html.
>
>I have a few notes about the repertoire:
>
>1. Your lacks an important sign, which I would call "PHAISTOS
>COMBINING LINE BELOW". This is the only handwritten sign on the disc; it is
>not clear whether it is some kind of diacritic (e.g. a sort of virama) or a
>punctuation sign. At any rate, it is clear that the sign has been
>deliberately written under the last signs of some groups ("words"?).

Um, can't something from General Punctuation be used, in the absence 
of knowing more about this "character"? Do you have an analysis of 
all the signs which take it in the document?

>2. The last sign of the tenth group ("word"?) is almost totally
>lost, due to a crack. However, it seems than none of the 45 known signs may
>fit in the gap. Many scholars consider this to be a 46th sign. The glyph
>normally used is the literature is a texture of diagonal lines.

Godart says "The last sign of set A:VIII was not deleted but broke 
off with a sliver of clay. Bearing mind the space and outline of the 
gap, which seems to roughtly follow the outline of the broken sign, 
it seems that the most plausible identification of the mysterious 
sign is a 3 [TATTOOED HEAD] or a 20 [DOLIUM], unless it is an 8 
[GAUNTLET] or a 4 [CAPTIVE], which is less likely." I don't want to 
encode a new character without better evidence (and wouldn't for ANY 
script). I haven't seen anything from other scholars who consider it 
a 46th sign.

>... about the character names:
>
>3. The names for E6FE and E6FF ("PHAISTOS PARAGRAPH SEPARATOR" and 
>"PHAISTOS PHRASE SEPARATOR") show imprudent assumptions. E.g., many 
>people consider E6FF to be a paragraph or text separator, and E6FE 
>to be a word separator. It would be more prudent to use a more 
>generic wording, e.g. "PHAISTOS VERTICAL LINE" and "PHAISTOS 
>VERTICAL DOTTED LINE".

I agree that those names aren't good. The dotted one occurs at the 
beginning of the text on both sides. PHAISTOS BEGINNING OF TEXT and 
PHAISTOS SEPARATOR then? I don't like VERTICAL LINE and DOTTED 
VERTICAL LINE very much. That kind of description we usually reserve 
for abstract technical symbols rather than punctuation.

>4. Names such as "pedestrian", "plumed head", ... "wavy band" are
>just nicknames used by scholars, as opposed to accepted identifications of
>the objects represented. It may be worth to emphasize this in the character
>names: e.g., "PHAISTOS SIGN KNOWN AS PEDESTRIAN".

We either use the numbers given by the scholars, so U+E6D0 can either 
be called PHAISTOS SIGN-01 or PHAISTOS SIGN PEDESTRIAN. Either way 
we're using a scholarly designation. The meaningful nicknames are 
more fun than the numeric ones

>... and about the Everson Phaistos font:
>
>5. I find that mirroring the signs as you did in your font is an
>unhistorical. The whole corpus is right-to-left, and the fact that the signs
>where impressed with types makes it impossible that the signs could have
>been reversed. In academic books, it is common practice to type the disc's
>text left-to-right, but the signs are not reversed.

I have followed Egyptological -- and ancient Egyptian -- practice 
here. If the script is represented right-to-left the faces point to 
the right so that you read into their faces. If the script direction 
is reversed so that it is left-to-right, it is conventional -- among 
Egyptologists and ancient Egyptians -- to reverse the signs as well. 
Godart does not reverse the glyphs even though he reverses the 
directionality, but I think it is *his* practice which is 
ahistorical, and I think it makes the text harder to read. And I 
suspect is has to do with the font technology he had in 1994 when he 
wrote his book.

>IMHO, the two characters in points 1 and 2 absolutely needed. Academic works
>which consider them as part of the script could not be encoded without them,
>while academic works which don't need them are not disturbed by their
>existence in the encoding.

I didn't think so. Any counter-arguments to the above?

I suppose this discussion could be instructive to potential 
script-proposers out there... ;-)
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Sinhala Unicode

2002-07-08 Thread Alan Wood

It was recently mentioned that there don't seem to be any Unicode fonts that
include Sinhala.

Wayne Albury recently drew my attention to Helawadana 2000, which claims to
allow editing of Sinhala (and Tamil) in Windows applications using Unicode
fonts.

I have not tried it (it costs $99 and the links to order it don't work!).
For more information, see:

http://www.microimage.com/helawadana/

Alan Wood
Documentation Writer / Web Master
Context Limited (http://www.context.co.uk)
mailto:[EMAIL PROTECTED]
http://www.alanwood.net (Unicode, special characters, pesticide names)





Re: Chromatic text, ligatures and Fraktur ligatures

2002-07-08 Thread Michael Everson

At 10:40 +0100 2002-07-08, William Overington wrote:
>Michael Everson wrote as follows.
>
>>Your courtyard codes and your scientific chromatic explorations are
>>not appropriate uses of the standard. With Quark XPress I can set my
>>fonts to display in HUNDREDS OF THOUSANDS if not MILLIONS of colours,
> .
>
>Courtyard codes and chromatic fonts are, in my opinion, entirely appropriate
>uses of the standard.

Your would be wrong.

>Recently I was referred to an ISO document about characters and glyphs,
>ISO/IEC TR 15285. [...] Courtyard codes and codes for chromatic 
>fonts, in my opinion, fall within the definition of character in 
>Annex B of that document.

Then you have not understood the definition, or you are twisting it 
to your own ends. The question is, are you twisting it because you 
really just don't get it, or are you doing this deliberately to waste 
our time and get some attention? Because it sure looks like one or 
another at this point.

>Courtyard codes also allow the use of millions of colours.  There are 18
>codes for changing colour, 16 for specific colours and 2 for colour 98 and
>colour 99 which can be set to any of those millions of colours using other
>courtyard codes.

This "technology" is useless because there are already solutions in 
use by REAL applications involving text markup.

>Courtyard codes are, in my opinion, very important for the future of
>broadcasting using the DVB-MHP system.  They will enable Unicode text files
>to carry colour and formatting information which can be straightforwardly
>interpreted by a variety of relatively small Java programs from a variety of
>content providers.

There are other methods of carrying colour and formatting information 
which are already in use. It is called markup.

>The advantages for the broadcasting of educational multimedia across 
>whole continents will be enormous if a consistent set of codes for 
>colours and basic formatting is widely used in a consistent manner.

Doh! Look! They've already invented it! IT'S CALLED MARKUP. Woo-hoo!

>Certainly if such a set were provided in plane 0 of regular Unicode
>then that would be magnificent, yet in any case, that takes time and the
>need to gain a consensus as to the use of a particular set of codes is now,
>and courtyard codes are, as far as I am aware, the only set of codes
>available to do the job at the present time.

You've deluded yourself into thinking that this is the way it should 
be done. It isn't, and therefore Unicode will never contain such 
codes. Get it? You're wasting your time and ours.

>  >If you can't support Unicode on older
>>systems then that's because the systems aren't good enough.
>
>Ah!  A digital divide issue.

You've misused the term "digital divide". It does not have to do with 
software versioning.

>Windows 95 and Windows 98 systems, which are
>not very old at all, cannot, as far as I am aware, support advanced font
>technology such as OpenType.  In addition, these advanced font technologies
>are not part of the international standards and it seems to me that it is a
>good thing for Unicode to provide facilities for advanced font usage, yet
>quite another thing to start cutting off support routes for users of older
>equipment, even when that equipment is only three years old.

Tough. That's the nature of software development. You try to support 
older data, but you don't resort to hacks to simulate new 
technological abilities in old systems. You take it as read that 
people will have to upgrade their software, hardware, memory, or 
whatever.

Advanced font technologies should not be part of international 
standards. That isn't what international standards are for. Unicode, 
as has been pointed out to you before, isn't an international 
standard, although its repertoire and architecture is identical with 
the repertoire of ISO/IEC 10646.

>  >Are PUA hacks to fix that a productive use of energy? One can't support
>  >everything in legacy data.
>
>You appear to be referring to my definition of the golden ligatures 
>collection.

All of your PUA "work", actually, not just that particular one.

>Well, first of all, I feel that the word "hack" is inappropriate. 
>The golden ligatures collection is a published list of Private Use 
>Area allocations.  The documents clearly state what they are and 
>what they are not.

It allocates code positions for ligatures when it is the stated 
intent of the standard not to do so. And it does so in order to 
provide some sort of bogus support for "older systems". I think 
"hack" is quite descriptive of what you are trying to achieve via 
character encoding as opposed to markup.

>The fact of the matter is that people who vote on these matters, largely
>only having a vote because they are the representatives of large
>corporations, have decided that no more precomposed ligatures will be added
>into Unicode.

Because the ones that are already there are only to support legacy 
data, and they are not recommended for us

RE: Chromatic text. (follows from Re: [unicode] Re: FW: Inappropriate Proposals FAQ)

2002-07-08 Thread Marco Cimarosti

William Overington wrote:
> >The problem (if there is one!) is only for font technology.
> >
> >> Ethiopian writing: [...] "The capability to the same electronically
> >> would be well received. /Daniel."
> >
> >Same for this one: Unicode's task was to provide a code point for the
> >Ethiopic full stop, and they did. Whether the corresponding glyph is
> colored
> >or not is problem for fonts and word processors.
> 
> Well, may I please suggest that the issue is one for Unicode 
> as well as for font technology?
>
> [...]

Of course you can. But my feeling is that you already *did* suggest this,
many and many times.

I interpret your post as one more lengthy repetition of your well-known
opinion: differences between "plain text" and "rich text" should not exist:
they should be eliminated by incorporating the mark-up in the encoding.

I think that it is your right to repeat your opinions as many times as you
want. Nevertheless, I find that repeating opinions which are already
well-known to everybody is *useless* and *boring*.

_ Marco




RE: Phaistos in ConScript

2002-07-08 Thread Marco Cimarosti

Michael Everson wrote:
> A Unicode-enabled font based on the ConScript encoding and a test 
> page containing the entire Phaistos corpus can be found at 
> http://www.evertype.com/standards/csur/phaistos-sample.html.

I have a few notes about the repertoire:

1. Your lacks an important sign, which I would call "PHAISTOS
COMBINING LINE BELOW". This is the only handwritten sign on the disc; it is
not clear whether it is some kind of diacritic (e.g. a sort of virama) or a
punctuation sign. At any rate, it is clear that the sign has been
deliberately written under the last signs of some groups ("words"?).

2. The last sign of the tenth group ("word"?) is almost totally
lost, due to a crack. However, it seems than none of the 45 known signs may
fit in the gap. Many scholars consider this to be a 46th sign. The glyph
normally used is the literature is a texture of diagonal lines.

... about the character names:

3. The names for E6FE and E6FF ("PHAISTOS PARAGRAPH SEPARATOR" and
"PHAISTOS PHRASE SEPARATOR") show imprudent assumptions. E.g., many people
consider E6FF to be a paragraph or text separator, and E6FE to be a word
separator. It would be more prudent to use a more generic wording, e.g.
"PHAISTOS VERTICAL LINE" and "PHAISTOS VERTICAL DOTTED LINE".

4. Names such as "pedestrian", "plumed head", ... "wavy band" are
just nicknames used by scholars, as opposed to accepted identifications of
the objects represented. It may be worth to emphasize this in the character
names: e.g., "PHAISTOS SIGN KNOWN AS PEDESTRIAN".

... and about the Everson Phaistos font:

5. I find that mirroring the signs as you did in your font is an
unhistorical. The whole corpus is right-to-left, and the fact that the signs
where impressed with types makes it impossible that the signs could have
been reversed. In academic books, it is common practice to type the disc's
text left-to-right, but the signs are not reversed.

IMHO, the two characters in points 1 and 2 absolutely needed. Academic works
which consider them as part of the script could not be encoded without them,
while academic works which don't need them are not disturbed by their
existence in the encoding.

_ Marco




longs

2002-07-08 Thread Michael Everson

At 11:34 +0200 2002-07-08, Stefan Persson wrote:
>- Original Message -
>From: "John H. Jenkins" <[EMAIL PROTECTED]>
>To: <[EMAIL PROTECTED]>
>Sent: Monday, July 08, 2002 12:56 AM
>Subject: Re:_How_do_I_encode_HTML_documents_in_old_languages_=C5¬øuch as 17th
>century Swedi‰øh in Unicode?
>
>
>>  On Wednesday, July 3, 2002, at 11:10 AM, Stefan Persson wrote:
>>
>>  > There is a big problem in the current Unicode ‰øtandard, ‰øince
>>  > Fraktur letters aren't ‰øupported in any ‰øuitable manner.
>>
>>  Aargh!  Medial long-s!  Run away!  Run away!  :-)
>
>Why ‰øhould I not u‰øe old characters that already were out-of-u‰øe centuries
>ago? ;-)

Becaufe it piffes people off? :-)
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: utf-8 and databases

2002-07-08 Thread Paul Hastings

> The primary concern is whether a database is able to represent the entire

this was a question that came up about older middleware (cf5) that couldn't
properly handle unicode, some folks (me included) were stuffing utf-8 into
databases that didn't understand it (ie a char became a series of bytes).
the question became, "are there any dbs that do"? and finally "how do you
tell"?




---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.373 / Virus Database: 208 - Release Date: 1/7/2545





Chromatic text, ligatures and Fraktur ligatures. (derives from Re: Chromatic text)

2002-07-08 Thread William Overington

Michael Everson wrote as follows.

>Your courtyard codes and your scientific chromatic explorations are
>not appropriate uses of the standard. With Quark XPress I can set my
>fonts to display in HUNDREDS OF THOUSANDS if not MILLIONS of colours,
 .

Courtyard codes and chromatic fonts are, in my opinion, entirely appropriate
uses of the standard.

Recently I was referred to an ISO document about characters and glyphs,
ISO/IEC TR 15285.  This is available in a zipped format as follows.  It
unzips to a .pdf file.

http://www.iso.ch/iso/en/ittf/PubliclyAvailableStandards/C027163e.zip

Courtyard codes and codes for chromatic fonts, in my opinion, fall within
the definition of character in Annex B of that document.  This is not me
finding some definition tucked away obscurely, it is central.  The
introduction section of the document states as follows.

quote

This Technical Report is written for a reader who is familiar with the work
of SC 2 and SC 18.  Readers without this background should first read Annex
B, "Characters" and Annex C, "Glyphs".

end quote

Courtyard codes also allow the use of millions of colours.  There are 18
codes for changing colour, 16 for specific colours and 2 for colour 98 and
colour 99 which can be set to any of those millions of colours using other
courtyard codes.  Indeed, it is possible to use them with colours of more
than 8 bits per colour channel so that they could be used for the high
definition colour option of .png files if so desired.  I may add a code into
courtyard codes to signal that use option explicitly.

Lots of programs can use millions of colours: expensive programs and widely
available programs.  It is part of modern computing.  For example the
Microsoft Paint program which can be used for preparing illustration files
using a particular set of colours chosen from the millions of colours which
the Paint program can be used to produce.  There is an article about such a
use in relation to preparing artwork for broadcasting upon the DVB-MHP
(Digital Video Broadcasting - Multimedia Home Platform) system at the
following address.

http://www.users.globalnet.co.uk/~ngo/pai07000.htm

Courtyard codes are, in my opinion, very important for the future of
broadcasting using the DVB-MHP system.  They will enable Unicode text files
to carry colour and formatting information which can be straightforwardly
interpreted by a variety of relatively small Java programs from a variety of
content providers.  The advantages for the broadcasting of educational
multimedia across whole continents will be enormous if a consistent set of
codes for colours and basic formatting is widely used in a consistent
manner.  Certainly if such a set were provided in plane 0 of regular Unicode
then that would be magnificent, yet in any case, that takes time and the
need to gain a consensus as to the use of a particular set of codes is now,
and courtyard codes are, as far as I am aware, the only set of codes
available to do the job at the present time.

>If you can't support Unicode on older
>systems then that's because the systems aren't good enough.

Ah!  A digital divide issue.  Windows 95 and Windows 98 systems, which are
not very old at all, cannot, as far as I am aware, support advanced font
technology such as OpenType.  In addition, these advanced font technologies
are not part of the international standards and it seems to me that it is a
good thing for Unicode to provide facilities for advanced font usage, yet
quite another thing to start cutting off support routes for users of older
equipment, even when that equipment is only three years old.

> Are PUA
>hacks to fix that a productive use of energy? One can't support
>everything in legacy data.

You appear to be referring to my definition of the golden ligatures
collection.  Well, first of all, I feel that the word "hack" is
inappropriate.  The golden ligatures collection is a published list of
Private Use Area allocations.  The documents clearly state what they are and
what they are not.

http://www.users.globalnet.co.uk/~ngo/golden.htm

The fact of the matter is that people who vote on these matters, largely
only having a vote because they are the representatives of large
corporations, have decided that no more precomposed ligatures will be added
into Unicode.  I have accepted that that was the situation in which we find
ourselves and that it is pointless seeking to get the decision changed, so I
have settled for the fact that they have made the decision and I have
published the golden ligatures collection and if the golden ligatures
collection gets widely used, then good.  Since you raise the matter,
however, I do feel that adding U+FB07 as a ct ligature would be useful and,
indeed, the golden ligatures collection is designed so that the chosen code
points dovetail nicely with the code points of the U+FB.. block of regular
Unicode: the issue seems more one of the politics of simply ignoring the
needs of people who are not using the very l

Re: Re: How do I encode HTML documents in old languages ſuch as 17th century Swediſh in Unicode?

2002-07-08 Thread Stefan Persson

- Original Message -
From: "John H. Jenkins" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, July 08, 2002 12:56 AM
Subject: Re:_How_do_I_encode_HTML_documents_in_old_languages_=C5¿uch as 17th
century Swediſh in Unicode?


> On Wednesday, July 3, 2002, at 11:10 AM, Stefan Persson wrote:
>
> > There is a big problem in the current Unicode ſtandard, ſince
> > Fraktur letters aren't ſupported in any ſuitable manner.
>
> Aargh!  Medial long-s!  Run away!  Run away!  :-)

Why ſhould I not uſe old characters that already were out-of-uſe centuries
ago? ;-)

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: utf-8 and databases

2002-07-08 Thread Tex Texin

Asmus is right that you shouldn't blithely assume that the encoding
itself gives a performance advantage.
However, I think this is more true when looking at software program
efficiency then database efficiency.

For example, some databases preallocate storage for records based on the
fixed width of the record as n characters, and then allocate the maximum
byte size of a character times n characters- So a 100 character record
requires 400 bytes for each record, even though much of the data might
actually be only 1 or two byte characters.

You can then see some large growth in utf-8 databases over utf-16 (where
the utf-16 versions allocate 16 bits instead of the maximal 32 per
character).

Similarly index keys are affected and if the key size has a low limit,
choosing one encoding over the other might give migration headaches.

I think Asmus and I are both saying you are likely asking the wrong
question. The encoding choice is a "don't care", since there is a 1-1
relationship and a simple efficient algorithm for going between them.

What you really want to ask of the vendor, and/or be testing for, is
given the kinds of data and operations you need to perform, how
efficient is the database at using its storage facilities, retrieving
the data, and executing the various operations (search, sort, etc.),
for each encoding.

hth
tex


Asmus Freytag wrote:
> 
> At 02:11 PM 7/7/02 +0700, Paul Hastings wrote:
> >is there a standard test that can determine whether a given
> >database can handle utf-8 (ie as "native" utf-8 not converting
> >to ucs-2 or whatever)?
> 
> Why is that of any interest?
> 
> The primary concern is whether a database is able to represent the entire
> repertoire of Unicode. Just create a string that contains the largest
> character 0x10FFFD, convert it to whatever encoding form the APIs require
> and see whether you get it back unmolested.
> 
> A more sophisticated test would take a longer string and attempt to sniff
> out incorrect truncation of characters.
> 
> A secondary concern is performance. If the choice of encoding form is a
> poor match for the actual data encountered, and if entering and retrieving
> the data requires too many transcoding steps, it's conceivable that this
> could be detected in the overall performance of the database.
> 
> However, there's no reason to assume that a theoretical match in encoding
> efficiency translates automatically into a more efficient database
> implementation.
> Therefore, regular benchmarking tools should be fine to determine database
> performance, as long as the test data is representative for the installation.
> 
> A./

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: utf-8 and databases

2002-07-08 Thread Asmus Freytag

At 02:11 PM 7/7/02 +0700, Paul Hastings wrote:
>is there a standard test that can determine whether a given
>database can handle utf-8 (ie as "native" utf-8 not converting
>to ucs-2 or whatever)?

Why is that of any interest?

The primary concern is whether a database is able to represent the entire 
repertoire of Unicode. Just create a string that contains the largest 
character 0x10FFFD, convert it to whatever encoding form the APIs require 
and see whether you get it back unmolested.

A more sophisticated test would take a longer string and attempt to sniff 
out incorrect truncation of characters.

A secondary concern is performance. If the choice of encoding form is a 
poor match for the actual data encountered, and if entering and retrieving 
the data requires too many transcoding steps, it's conceivable that this 
could be detected in the overall performance of the database.

However, there's no reason to assume that a theoretical match in encoding 
efficiency translates automatically into a more efficient database 
implementation.
Therefore, regular benchmarking tools should be fine to determine database 
performance, as long as the test data is representative for the installation.

A./