Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread Stefan Persson

- Original Message -
From: Asmus Freytag [EMAIL PROTECTED]
To: Karl Pentzlin [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: den 31 januari 2002 22:09
Subject: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT
SELECTOR (was: Re: Proposing Fraktur))


 A more productive distinction would be along these lines:

 a) is the feature necessary for correctly expressing the content

Yes.

 b) is the feature rule based, and

Yes.

 b.1) is the rule implementable w/o knowledge of semantics, or

No.

 c) when implementing the feature, is it necessary to
 c.1) provide scope information, or

Yes.

 c.2) is local context sufficient

No.

 Leaving out italics from a document can not only change the level of
 emphasis, but for example in English, there are occasional circumstances
 where the use of italics removes a possible ambiguity in interpreting
 a sentence. Nevertheless (except for mathematics) italics were left to
 a higher level protocol (style markup).

Italics is better supported than Fraktur, as most word processors have an
option for using italics with any font installed on the computer. For
Fraktur one has to use a different font. There is no Fraktur font widely
spread on all Windows computers or something like that, so it's almost
impossible to using Fraktur text in any public document or similar w/o using
bitmaps.

Why was Fraktur supported for mathematics, but not for old
Swedish/German/etc.?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread Stefan Persson

- Original Message -
From: Asmus Freytag [EMAIL PROTECTED]
To: Karl Pentzlin [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: den 31 januari 2002 22:09
Subject: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT
SELECTOR (was: Re: Proposing Fraktur))


 A more productive distinction would be along these lines:

 a) is the feature necessary for correctly expressing the content

Yes.

 b) is the feature rule based, and

Yes.

 b.1) is the rule implementable w/o knowledge of semantics, or

No.

 c) when implementing the feature, is it necessary to
 c.1) provide scope information, or

Yes.

 c.2) is local context sufficient

No.

 Leaving out italics from a document can not only change the level of
 emphasis, but for example in English, there are occasional circumstances
 where the use of italics removes a possible ambiguity in interpreting
 a sentence. Nevertheless (except for mathematics) italics were left to
 a higher level protocol (style markup).

Italics is better supported than Fraktur, as most word processors have an
option for using italics with any font installed on the computer. For
Fraktur one has to use a different font. There is no Fraktur font widely
spread on all Windows computers or something like that, so it's almost
impossible to use Fraktur text in any public document or similar w/o using
bitmaps to displaying the characters.

Why was Fraktur supported for mathematics, but not for old
Swedish/German/etc.?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread John Hudson

At 07:35 2/3/2002, Stefan Persson wrote:

Italics is better supported than Fraktur, as most word processors have an
option for using italics with any font installed on the computer. For
Fraktur one has to use a different font.

Um, for italics one has to use a different font also. Many programs provide 
an italics button that activates the italic member of a font family, but 
this still involves selecting a separate font.

There is no Fraktur font widely
spread on all Windows computers or something like that, so it's almost
impossible to using Fraktur text in any public document or similar w/o using
bitmaps.

There are plenty of Fraktur and other blackletter fonts available. Many of 
the best ones are available from Linotype in Germany. If you think that a 
Fraktur font should come installed on operating systems, you should 
petition your OS developer.

I don't see that these font availability issues have anything to do with 
Unicode.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread Michael \(michka\) Kaplan

From: John Hudson [EMAIL PROTECTED]

 Um, for italics one has to use a different font also. Many
 programs provide an italics button that activates the italic
 member of a font family, but this still involves selecting a
 separate font.

Au contraire, sir! Many fonts *do* have a separate .TTF files for the
italic version, bu there are just as many that do not, yet the italic option
does not find itself disabled in programs.


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/






Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread Curtis Clark

At 10:25 AM 2/3/02, John Hudson wrote:
Um, for italics one has to use a different font also. Many programs 
provide an italics button that activates the italic member of a font 
family, but this still involves selecting a separate font.

And it would be simple to set up a font family so that Fraktur would be the 
normal state, and the italic button on the word processor would select a 
Roman member of the family (if you still needed sloped italics, those could 
be assigned to the bold italic slot).


-- 
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/






Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread John Hudson

At 10:55 2/3/2002, Michael \(michka\) Kaplan wrote:

  Um, for italics one has to use a different font also. Many
  programs provide an italics button that activates the italic
  member of a font family, but this still involves selecting a
  separate font.

Au contraire, sir! Many fonts *do* have a separate .TTF files for the
italic version, bu there are just as many that do not, yet the italic option
does not find itself disabled in programs.

Ah. Those 'italics'. Those are not italics. Those are slanted romans. 
Sorry, I thought we were talking about typography.

In Adobe InDesign, the italic function is disabled if an italic font is not 
available. There is a separate control for slanting text, but it is not 
possible to accidentally produce a sloped roman in the absence of an italic 
font. This is how it should be.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





RE: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)

2002-02-01 Thread Oliver Christ

Hi, 

Ken wrote: 

 frakturDas sinkende Schiff sandte/fraktur SOSfraktur-Rufe./fraktur
 or conversely, perhaps better:
 Das sinkende Schiff sandte antiquaSOS/antiqua-Rufe.

at the end, it may be more useful to rather markup the semantics than
formatting properties, i.e.

This is not a question of foreign origin=DEZeitgeist/foreign.

It is the responsibility of the rendering engine (style sheet, ...) to map
that markup to whatever font/script/typeface should be used, according to
users' (or typesetters') preferences, current environment and purpose. 

- The author or some post-authoring process would (hopefully ;-) ) have the
knowledge about where the linguistic expression originates from and can
apply appropriate (semantic) markup, but doesn't need to care about
typesetting conventions (which the author may not be expert in).

- The rendering engine/typesetter doesn't need to have any linguistic
information (such as a database of loan words), but only needs to know how
to map foreign content to formatting properties in a given context. 

- Third, depending on the environment and purpose, different stylistic
conventions may be necessary for the same linguistic expression (fraktur in
one document, no special formatting in another) so that any
formatting-oriented markup (or encoding, for that matter) will potentially
reduce the reusability of the document.


Cheers, Oli

Oliver Christ
TRADOS GmbH
Stuttgart




RE: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)

2002-01-31 Thread Yves Arrouye

 quite a lot of space. However, Fraktur is already encoded in the
 Mathematical whatever-it's-called block. This variant selector would mean
 that lots of characters can be displayed in two *different* ways. I'd
 prefer
 that Fraktur diacritics were added instead, and that the mathematical
 letters were to be used for Fraktur texts.

I hope not. These were encoded there because they convey a specific meaning
when used for mathematics. If you use them to spell out names, then you're
abusing them and potentially confusing software that would rely on their
mathematical semantics.

I think it's time to have another proposal for French, FRENCH VARIANT
SELECTOR, where we do not use Fraktur but some other font variation. And we
may need a QUEBEC VARIANT SELECTOR if they have different rules... Or should
it be a QUEBEC FRENCH VARIANT SELECTOR to show the relationship?

YA





Re: Proposing Fraktur

2002-01-31 Thread Stefan Persson

- Original Message -
From: Kenneth Whistler [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: den 31 januari 2002 01:04
Subject: Re: Proposing Fraktur


  And so what? I thought the meaning of Unicode was that all languages
should
  be fully supported in plain text, using one single font to displaying
all of
  the characters. With old Swedish, this isn't possible.

 I think this misconstrues the mission of Unicode as an encoding. The goal
 is to encode sufficient characters to enable the correct and legible
 representation of *plain* text in any script (modern or historic).

This difference has to be done everywhere (read: including in plain text),
otherwise the text is grammatically wrong.

- Original Message -
From: Yves Arrouye [EMAIL PROTECTED]
To: 'Stefan Persson' [EMAIL PROTECTED]; Karl Pentzlin
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: den 31 januari 2002 09:54
Subject: RE: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re:
Proposing Fraktur)

  quite a lot of space. However, Fraktur is already encoded in the
  Mathematical whatever-it's-called block. This variant selector would
mean
  that lots of characters can be displayed in two *different* ways. I'd
  prefer
  that Fraktur diacritics were added instead, and that the mathematical
  letters were to be used for Fraktur texts.

 I hope not. These were encoded there because they convey a specific
meaning
 when used for mathematics. If you use them to spell out names, then you're
 abusing them and potentially confusing software that would rely on their
 mathematical semantics.

Letters A through Z and ALPHA through OMEGA are used in *both* text and
mathematics, and I see no problem with this. Why would this cause problems
with the Fraktur letters in the Mathematical Alphanumeric Symbols block?

 I think it's time to have another proposal for French, FRENCH VARIANT
 SELECTOR, where we do not use Fraktur but some other font variation. And
we
 may need a QUEBEC VARIANT SELECTOR if they have different rules... Or
should
 it be a QUEBEC FRENCH VARIANT SELECTOR to show the relationship?

Do you have to use *both* kinds of characters at the same time in the same
document? In old Swedish you have to use *both* a's at the same time,
otherwise the text is grammatically wrong, be it so in plain text.

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-01-31 Thread Asmus Freytag

At 09:42 AM 1/30/02 +0100, Karl Pentzlin wrote:
The question is, are typesetting rules part of the script?

(I mean rules in the sense of obligatory regulations, not guidelines).

This distinction is a very German way of approaching the question.

If yes, (in my opinion) the plain text must carry the information that is
needed to follow them. If no, their execution can be left to higher level
protocols (which then have to decide whether a word is a foreign word
[to be set in Roman letters] or a name [to be set in Fraktur letters],
such at least according to German typesetting rules).

A more productive distinction would be along these lines:

a) is the feature necessary for correctly expressing the content
b) is the feature rule based, and
b.1) is the rule implementable w/o knowledge of semantics, or
c) when implementing the feature, is it necessary to
c.1) provide scope information, or
c.2) is local context sufficient

Looking at this list, roughly in reverse order:

Higher level protocols, understood as markup languages in particular,
do really well, when implementing something requires defining a scope,
since in them, all text data and the effect of all syntax are scoped
already.

If layout features can be determined algorithmically, it makes little
sense to add what can be derived from the existing text data, also into
the markup. Allowing for duplicate representation of information, always
allows the possibility of something getting out of step.

If semantic knowledge is required to implement a feature, this knowledge
must be supplied. If the extra information can be expressed as point-like,
local context, then it makes much *less* sense to use higher level markup
compared to character codes. Character codes, in a way, provide the ideal
representation of point like context in a data stream.

Finally, we get back to the original argument. Whether a typesetting
rule (and by rule I mean both conventions and legislated rules) is
supported by information added to the plain text or not, does not depend
on whether a national authority promulgates it, or whether it just
represents the consensus of the users of the language.

If, in practice, such a rule can be ignored, yet not change the meaning
of the text, it's a good candidate for not being implemented via plain
text. However, this is not absolute:

Leaving out italics from a document can not only change the level of
emphasis, but for example in English, there are occasional circumstances
where the use of italics removes a possible ambiguity in interpreting
a sentence. Nevertheless (except for mathematics) italics were left to
a higher level protocol (style markup).

Overriding bad hyphenation, or bad line breaks, is supported by SHY and
NBSP, even though hyphenation is not required at all to express the
content of a text, nor would bad line breaks e.g. after Dr. change
the meaning of the text.

In the latter two cases, character codes were added (fairly early) to
plain text, because using point-like context to support these very
common algorithms (hyphenation and linebreak) is an elegant solution,
while adding markup for the same purpose would be inelegant to the
extreme.

Like everything else in character encoding, there are shades of gray,
and levels of gradation, so not everything is clear cut. But recognizing
up front that character codes may legitimately serve the support of
algorithms, even where the feature implemented by the algorithm is
merely common, and not absolutely and minimally required, is useful.

A./




Re: Proposing Fraktur

2002-01-31 Thread David Starner

On Thu, Jan 31, 2002 at 07:32:40PM +0100, Stefan Persson wrote:
 Do you have to use *both* kinds of characters at the same time in the same
 document? In old Swedish you have to use *both* a's at the same time,
 otherwise the text is grammatically wrong, be it so in plain text.

Being grammatically wrong implies that there's a error in the normal
form of the language - that is, the spoken form for most languages. And
I don't see it as any different from the rules that you must put the
titles of books in italics.

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR(was: Re: Proposing Fraktur))

2002-01-31 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B

.. about Fraktur vs. Roman being a codepoint difference rather than a 
markup difference..

Like everything else in character encoding, there are shades of 
gray,
and levels of gradation, so not everything is clear cut. But 
recognizing
up front that character codes may legitimately serve the support of
algorithms, even where the feature implemented by the algorithm is
merely common, and not absolutely and minimally required, is useful.

A./


$B$+$?$+$J$`$h$&$>!*$R$i$,$J$r$D$+$($k!*!*(B

I DON'T NEED LOWERCASE! I CAN USE CAPITAL LETTERS!



$B"*!!$8$e$&$$$C$A$c$s!!"+(B
$B!!$@$s$;$$$i$7$5$`$h$&(B


_
$B%$%s%?!<%M%C%H$r$V$i$V$i%7%g%C%T%s%0$9$k$J$i(BMSN $B%7%g%C%T%s%0$X(B 
http://shopping.msn.co.jp/


Re: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)

2002-01-30 Thread Karl Pentzlin

Am Mittwoch, 30. Januar 2002 um 00:39 schrieb Philipp Reichmuth:

PR ... for example, in German hyphenation the consonant
PR cluster ck gets hyphenated as k-k under some circumstances. This
PR is a rule as well, but still it is a clear case where putting it into
PR the encoding by means of a hypothetical UNUSUAL HYPHENATION SELECTOR
PR would be a bit inappropriate.

This is a complete algorithmic decision. Some circumstances is
practically identical to using old (i.e. pre-1998) ortography (at least
I don't know a German compound word which first part ends in -c and which
second part starts with k-). The new orthography hyphenates before the
-ck. (Thus, the decision how to hyphenate ck is for the whole text,
not for the individual position, and does not need to be marked there.)

PR  I think most of these cases, including
PR the Fraktur problem, deal with _typesetting_ rules and should thus be
PR left to _typesetting_ software, i.e. the now-famous higher level
PR protocol.

The question is, are typesetting rules part of the script?

(I mean rules in the sense of obligatory regulations, not guidelines).
If yes, (in my opinion) the plain text must carry the information that is
needed to follow them. If no, their execution can be left to higher level
protocols (which then have to decide whether a word is a foreign word
[to be set in Roman letters] or a name [to be set in Fraktur letters],
such at least according to German typesetting rules).

PR Would this mean much of an advantage over selecting a different font
PR for the respective character by means of markup?

The advantage is that you can encode text to be displayed correctly
(i.e. according to the obligatory typesetting rules) in Fraktur as
plain text. You even can display this text correctly in Fraktur or
Roman without change (as you can encode a Serbocroatian plain text to
be displayed in Latin or Cyrillic correctly without change).

Fraktur and Roman are script variants, not font variants. Both
script variants have a lot of fonts, but they are not fonts themselves.

If you regard the typesetting rules as part of the script, you can
look at Fraktur as a script variant which has four cases:
upper/lower for foreign words and upper/lower for the rest.
The former accidentily happen to look like the two cases of the Roman
script variant; thus you can use a Roman font for these two cases and
another real Fraktur letter font for the other two.
Cases could be left to higher level protocols, but for good reasons
they are not.

--
Karl Pentzlin
ACS Analysis Consulting  Software GmbH
München, Germany





RE: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)

2002-01-30 Thread Marco Cimarosti

Karl Pentzlin wrote:
 [...] (as you can encode a Serbocroatian plain text to
 be displayed in Latin or Cyrillic correctly without change).

I guess you are talking about old Yugoslav character sets, as this would not
be possible in Unicode.

Another case of a single encoding which overlaps more than one script is
ISCII, the Indian standard encoding.

 Fraktur and Roman are script variants, not font variants. Both
 script variants have a lot of fonts, but they are not fonts 
 themselves.

In rich text, you don't necessarily have to set a different font for roman
words in Fraktur text: the higher level protocol could be designed to have a
roman or loanword tag which is independent of font choice.

In plain text, I think that plane 14 language tags could be used: imagine
defining a language old Swedish and a sub language old Swedish/LOANWORD.
But I know that these language tags are not very popular, and perhaps I am
stretching their usage scope too much...

_ Marco




Re: Proposing Fraktur

2002-01-30 Thread Michael Bauer


 origin, while katakana and hiragana letters are very different and
generally
 derive from completely different ideographs.
 Mark

Actually no. Of the 46 syllables, 31 have a shared root, only the derivation
is different (block writing for katakans and fast handwriting for hiragana)
... not quite what I'd call generally ; )

Mìcheal





RE: Proposing Fraktur

2002-01-30 Thread Marco Cimarosti

Michael Bauer wrote:
  origin, while katakana and hiragana letters are very different and
 generally
  derive from completely different ideographs.
  Mark

Mark or Marco? Well, anyway, the root is shared. :-)

 Actually no. Of the 46 syllables, 31 have a shared root, only 
 the derivation is different (block writing for katakans and
 fast handwriting for hiragana)
 ... not quite what I'd call generally ; )

Oh, right! Although I count 48 syllables and 30 shared roots, that doesn't
change the basic the fact that my generally is to be corrected as
sometimes or often at best...

 Mìcheal

Mìcheal or Michael? Well, anyway, the root is shared. :-)

_ Marco




Re: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)

2002-01-30 Thread David Starner

On Wed, Jan 30, 2002 at 09:42:08AM +0100, Karl Pentzlin wrote:
 The advantage is that you can encode text to be displayed correctly
 (i.e. according to the obligatory typesetting rules) in Fraktur as
 plain text. You even can display this text correctly in Fraktur or
 Roman without change (as you can encode a Serbocroatian plain text to
 be displayed in Latin or Cyrillic correctly without change).

What happens to the long s? That needs changing if you're talking about
Roman script since the 19th century.

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)

2002-01-30 Thread Stefan Persson

- Original Message -
From: Karl Pentzlin [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: den 29 januari 2002 23:39
Subject: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re:
Proposing Fraktur)


 While in Swedish this is a *tradition* according to Stefan, in German
 it is even a *rule*.

Also in Swedish, this was a rule. But from the end of the 18th century,
people began publishing books in Fraktur *only*, or antiqua *only*. In some
books, the antiqua part was written in italics instead. NOTE: This italic
thing should be considered as a glyph variant.

 Maybe something like a ROMAN VARIANT SELECTOR would be appropriate:

In any case, it'd be better to have *two* selectors, one to turn on Fraktur,
and a different one to turn it off. Otherwise, you'd have to put the variant
selector after *every* letter you want to be in antiqua, which would require
quite a lot of space. However, Fraktur is already encoded in the
Mathematical whatever-it's-called block. This variant selector would mean
that lots of characters can be displayed in two *different* ways. I'd prefer
that Fraktur diacritics were added instead, and that the mathematical
letters were to be used for Fraktur texts.

NOTE: Sometimes part of a word is in Fraktur, and a different part in
antiqua. Example: the Swedish word latin is a Latin loan word, and should
thus be written in antiqua. However, if you add the Swedish ending -sk,
you'll get latinsk (Latin-like). The ending is Swedish and can, but
doesn't have to, be written in Fraktur. It's up to the author to decide
which.

 This selector could fulfill another important purpose:

 If this selector appears after a U+017F (long s), this character is
 only to be displayed as long s when it is (by means of a higher level
 protocol) to be displayed in Fraktur. Otherwise it is to be displayed
 as U+0073 (lower case s).

Long s is displayed as long s in antiqua words used in Fraktur Swedish.
So this wouldn't work. Instead, one would have to write s in German texts.

A comma after a Fraktur word is displayed as *either* , or / (glyph
difference), while a comma after an antiqua word is *always* displayed as
,. So I guess that a Fraktur comma would also have to be added…

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)

2002-01-30 Thread Stefan Persson

- Original Message -
From: Karl Pentzlin [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: den 30 januari 2002 09:42
Subject: Re: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re:
Proposing Fraktur)


 PR  I think most of these cases, including
 PR the Fraktur problem, deal with _typesetting_ rules and should thus be
 PR left to _typesetting_ software, i.e. the now-famous higher level
 PR protocol.

 The question is, are typesetting rules part of the script?

 (I mean rules in the sense of obligatory regulations, not guidelines).
 If yes, (in my opinion) the plain text must carry the information that is
 needed to follow them. If no, their execution can be left to higher level
 protocols (which then have to decide whether a word is a foreign word
 [to be set in Roman letters] or a name [to be set in Fraktur letters],
 such at least according to German typesetting rules).

In this case:

* The program would have to know which language it's dealing with, and which
spelling rules are used in the text (in Swedish: free spelling (as
preferred), pre-1905, and post-1905).
*It would have to know every loan word and personal name.

Here's a difficult case:

* Et: Latin word. Used in Swedish in cases such as et cetera. Written in
antiqua
* Et: old spelling for ett (a, one). Written in Fraktur.

How would the program know which of them I'm referring to?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Proposing Fraktur

2002-01-30 Thread Kenneth Whistler

Stefan Persson wrote:

 AFAIK, the criteria for adding any character to the Standard is that there
 should be a difference between the character and all the other characters
 already supported by the Standard. Here we have a such difference, doesn't
 this mean that Fraktur ought to be added to the Standard.

Asmus pretty thoroughly laid out the issues for kana and Fraktur. I won't
say anything further about that.

But stepping back a little further, I would like to point out that the
assertion that:

  the criteria for adding any character to the Standard is that there
   should be a difference between the character and all the other characters
   already supported by the Standard  italipsissima verba/ital == irony warning

begs the questions which arise about the identity of the character in
the first place.

Every marking on paper (or papyrus, or clay, or stone, for that matter)
is not necessarily a character deserving of encoding as a character
in the universal character encoding, even if I can show systematic differences
between it and existing characters in the standard.

On the one hand, one must show that the differences don't fall within the
range of acceptable variation for an already existing encoded character.
And one must show that the entity in question has some verifiable
existence as an abstract character, or that some processing requirement
forces consideration of its encoding as a character.

Merely being a distinct glyph is not enough.

 And so what? I thought the meaning of Unicode was that all languages should
 be fully supported in plain text, using one single font to displaying all of
 the characters. With old Swedish, this isn't possible.

I think this misconstrues the mission of Unicode as an encoding. The goal
is to encode sufficient characters to enable the correct and legible
representation of *plain* text in any script (modern or historic).

The goal is not and has never been to enable the plain text representation
of *all* extant and future texts of any form. For that, markup, high-level
layout, and font selection has always been required.

 Again: one language, one font.

No. One font is sufficient for monofont display of a language, tautologously. 
But there is no presumption that any and all text in a language need be
displayed in a single font, or that such a goal would even be desirable.

--Ken

 
 Stefan




RE: Proposing Fraktur

2002-01-29 Thread Marco Cimarosti

Stefan Persson wrote:
 In old Swedish there was a tradition of writing words of 
 foreign origin in the Roman type of letters (in Swedish
 referred to as antikva), while the rest of the words
 were written in Fraktur.

I have seen the same usage in German, on an old Duden dictionary: words of
foreign origins and etymologies were in Roman, the rest being in Fraktur.

 This is similar to the difference between katakana and
 hiragana/kanji in modern Japanese.

And a similar difference is used in all modern European languages: roman for
normal text and italics for foreign words.

But notice that roman, italics and Fraktur all look alike and share a common
origin, while katakana and hiragana letters are very different and generally
derive from completely different ideographs.

 [...] I know that the letters A-z are already supported
 in the Mathematical Alphanumeric Symbol block (and some
 in the Letterlike Symbols block), 

AFAIK, those characters should not be used to compose text: they are
supposed to be *symbols* to be used by mathematicians too busy to set a
different font. ;-)

_ Marco




Re: Proposing Fraktur

2002-01-29 Thread Stefan Persson

- Original Message -
From: Marco Cimarosti [EMAIL PROTECTED]
To: 'Stefan Persson' [EMAIL PROTECTED]; Unicode-listan
[EMAIL PROTECTED]
Sent: den 29 januari 2002 19:39
Subject: RE: Proposing Fraktur


 Stefan Persson wrote:
  In old Swedish there was a tradition of writing words of
  foreign origin in the Roman type of letters (in Swedish
  referred to as antikva), while the rest of the words
  were written in Fraktur.

 I have seen the same usage in German, on an old Duden dictionary: words of
 foreign origins and etymologies were in Roman, the rest being in Fraktur.
 [...]
 And a similar difference is used in all modern European languages: roman
for
 normal text and italics for foreign words.

The only case I've seen this in use is for some special frases of French
origin when used in English. Besides, this is no rule (i.e. you don't have
to use italics), while this rule was applied to *all* occurences of such
words in old Swedish.

 But notice that roman, italics and Fraktur all look alike and share a
common
 origin, while katakana and hiragana letters are very different and
generally
 derive from completely different ideographs.

And so what? I thought the meaning of Unicode was that all languages should
be fully supported in plain text, using one single font to displaying all of
the characters. With old Swedish, this isn't possible.

  [...] I know that the letters A-z are already supported
  in the Mathematical Alphanumeric Symbol block (and some
  in the Letterlike Symbols block),

 AFAIK, those characters should not be used to compose text: they are
 supposed to be *symbols* to be used by mathematicians too busy to set a
 different font. ;-)

Again: one language, one font.

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)

2002-01-29 Thread Karl Pentzlin

Am Dienstag, 29. Januar 2002 um 17:07 schrieb Stefan Persson:

SP In old Swedish there was a tradition of writing words of foreign origin in
SP the Roman type of letters (in Swedish referred to as antikva), while the
SP rest of the words were written in Fraktur. ...

Am Dienstag, 29. Januar 2002 um 19:39 schrieb Marco Cimarosti:

MC I have seen the same usage in German, on an old Duden dictionary: words of
MC foreign origins and etymologies were in Roman, the rest being in Fraktur.

This is still valid for Fraktur typesetting according to the *actual* Duden,
at least to the edition of 1996 which I have (21th edition; the one which
introduced the new German ortography which became effective 1998).
See page 66.
The Duden uses e.g. the following example:
Das sinkende Schiff sandte SOS-Rufe. (The sinking ship emitted SOS calls.)
fff Ffff ff Ff aaaff
(f=Fraktur (i.e. Blackletter), a=Antiqua (i.e. Roman), F=U+017F in Fraktur)

While in Swedish this is a *tradition* according to Stefan, in German
it is even a *rule*. The Duden says:
Fremdsprachige Wörter und Wortgruppen ... sind im Fraktursatz als
Antiqua zu setzen, i.e. Words of foreign languages and groups of
them ... have to be typeset in Roman within Fraktur typesetting.

This may be an argument proving that the Fraktur/Roman
differentation can be a matter of text rather than of higher level
protocols, as in fact claimed by Stefan.
On the other hand, Fraktur is too obviously a variant of the Latin
script to be encoded separately.

Maybe something like a ROMAN VARIANT SELECTOR would be appropriate:

If this appears after a character which is (by means of a higher level
protocol) to be displayed in Fraktur otherwise, that character is to be
displayed in Roman. In other circumstances, this selector can be ignored.

This selector could fulfill another important purpose:

If this selector appears after a U+017F (long s), this character is
only to be displayed as long s when it is (by means of a higher level
protocol) to be displayed in Fraktur. Otherwise it is to be displayed
as U+0073 (lower case s).

This would allow German (and maybe Swedish etc.) texts to be encoded
in a way that they can be displayed correctly in Fraktur as well as in
Roman. The German orthographic rules require that a normal s (U+0073)
is to be used when using Roman script where a long s (U+017F) is to be
used when using Fraktur script.

--
Karl Pentzlin
ACS Analysis Consulting  Software GmbH
München, Germany





Re: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)

2002-01-29 Thread Philipp Reichmuth

Hello Karl and others,

KP While in Swedish this is a *tradition* according to Stefan, in German
KP it is even a *rule*. The Duden says:
KP Fremdsprachige Wörter und Wortgruppen ... sind im Fraktursatz als
KP Antiqua zu setzen, i.e. Words of foreign languages and groups of
KP them ... have to be typeset in Roman within Fraktur typesetting.
KP This may be an argument proving that the Fraktur/Roman
KP differentation can be a matter of text rather than of higher level
KP protocols, as in fact claimed by Stefan.

On the other hand, for example, in German hyphenation the consonant
cluster ck gets hyphenated as k-k under some circumstances. This
is a rule as well, but still it is a clear case where putting it into
the encoding by means of a hypothetical UNUSUAL HYPHENATION SELECTOR
would be a bit inappropriate. I think most of these cases, including
the Fraktur problem, deal with _typesetting_ rules and should thus be
left to _typesetting_ software, i.e. the now-famous higher level
protocol.

KP If this appears after a character which is (by means of a higher level
KP protocol) to be displayed in Fraktur otherwise, that character is to be
KP displayed in Roman. In other circumstances, this selector can be ignored.

Would this mean much of an advantage over selecting a different font
for the respective character by means of markup?

  Philippmailto:[EMAIL PROTECTED]





RE: Proposing Fraktur

2002-01-29 Thread Murray Sargent

David Starner said:
 
Fraktur is not a different script from the Latin script, and therefore is
not encoded separately.
 
True, but Fraktur math characters are encoded in plane 1 for use in mathematics. These 
characters are not intended to be used for natural language purposes (unless you think 
of mathematics as a natural language :-) In which case, it's probably the only truly 
international natural language.
 
Thanks
Murray




Re: Proposing Fraktur

2002-01-29 Thread Asmus Freytag

Kana (Hiragana/Katakana):
Two (essentially) iso-phonic(?) systems, where each symbol
in one set has a corresponding symbol in the other set,
both denoting the same sound value.

The set of forms are historically unrelated.

There is little overlap in the forms.

Competent readers will know both sets, but will lean
them separately.

Convention decides which set to use, but innovative uses
are known that flout these conventions. Use of Katakana
for foreign words is conventional.

Having longer texts (book length) available in both
forms, however, is very uncommon (never say never).

The rules of layout are identical, spelling rules
differ in the demarkation of vowel length.

Widespread daily modern use

Monofont support is a practical everyday requirement

Encoded as two scripts  

Latin (Fraktur/Roman/italic):
Three isophonic systems

Forms historically related

Some overlap in the forms
(Some forms of Fraktur have what I call 'embellished roman'
 capitals, instead of true Fraktur shapes.)

Knowledge of roman/italic only is widespread, but reading Fraktur
can be self-taught.

Convention decides which one to use, when they occur together,
but innovative uses of Fraktur are common for names and titles,
and misuse of italics is rampant. Use of roman for foreign words
is a common feature of Fraktur texts. Use of italic for emphasis
is a common feature of roman texts.

Books published in Fraktur, have commonly been republished in
roman style, as Fraktur has fallen out of common use.

The rules of layout (ligating, hyphenating, etc.) are different.

In Fraktur, emphasis is denoted by s e p a r a t i n g the letters,
whereas in many languages w/o a Fraktur tradition, italics have
taken on this role, and character spacing is used to justify lines.
For languages with Fraktur tradition, separation is still used
with roman, and automatic use for character spacing is an example
of poor localization (!).

No longer widespread use. Limited to attention grabbing
(titles, names) and specialize (math) uses.

Monofont support is not an everyday requirement, except
in specialized notation (mathematics).

Encoded as one script plus extension for mathematics.

I think this is a complete summary. My belief is that if Fraktur was still
common today, and more commonly used together with roman, and/or if Japanese
usage rules for Kana were somewhat different, then the resulting encodings
might well have been different in each case.

 From a purely rich-text point of view there is nothing that prevents treating
the Kana as a single script. On the other hand, the layout rule differences
make a simple font substitution awkward for Fraktur text of any length. So
does the use of length mark for vowels in Katakan, vs. vowel doubling in
Hiragana. No such issues exist for roman/italic.

A./

PS: The set of 'scripts' unified with Latin is in fact a bit larger, if 
manuscript
and handwriting styles are considered as well. Some handwriting styles 
(Suetterlin)
are so different that considerable training is required to read them.

PPS: I don't care to distinguish between 'conventions' and 'rules'. A tendency
of considering conventions as/in tersm of rules, is quite conventional in 
Germany ;-)