from:"John Hudson"

Re: Accessing the WG2 document register

2015-06-10 Thread John Hudson

Anshu, I simply treat WG2 as a bureaucratic exercise bolted onto the 
actual work that Unicode does. In 20 years, I have never once had 
occasion to refer to ISO 10646, while I refer to Unicode every day. When 
I visit clients, none of them talk about implementing ISO 10646; they 
all talk about implementing Unicode.


My recommendation is simply to ignore WG2 and act as if it doesn't 
exist. It already might as well not, and with its policies is only 
likely to become more and more irrelevant.



JH


--

John Hudson
Tiro Typeworks Ltdwww.tiro.com
Salish Sea, BCt...@tiro.com

Getting Spiekermann to not like Helvetica is like training
a cat to stay out of water. But I'm impressed that people
know who to ask when they want to ask someone to not like
Helvetica. That's progress. -- David Berlow

Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-05 Thread John Hudson


Mahesh T. Pai wrote


  PUA isn't necessary, and a font technology that handles elements of
  complex script shaping by referencing PUAs isn't fundamentally any
  different from one that uses glyph names or another identifier and
  leaves the glyph unencoded.



Does one exist? Does it work? (we will leave out the acceptance /
popularity part). 


I recall old Arabic layout models that relied on AFII glyph naming, but 
there are good reasons why such systems were never widely popular and 
have not persisted. There are also custom Indic layout models still in 
use that rely on fonts made in particular ways with particular glyph 
sets, which I would consider similar in that the layout intelligence 
lies outside the font in the software (in this case in plug-ins for 
InDesign). Such models tend to be intinsically limited in their 
capabilities, because the fonts must contain very particular collections 
of glyphs named (or encoded) in particular ways. Smart font formats such 
as OpenType, AAT and Graphite, are much more flexible, because one can 
solve layout problems in more than one way. For example, I have recently 
been working on a Odia (Oriya) font in which I needed vertically 
shortened forms of letters in conjunct-initial positions due to 
technical limitations of the target environment in which the font is 
used. These are accessed using OpenType contextual GSUB lookups. This is 
an example of a layout solution that is design-specific, and which would 
not be possible if I were working in a layout model with a fixed set of 
recognised identifiers.


I don't know why OT Layout is not yet implemented in Android phones. I 
can think of a number of possible reasons, a combination of which might 
apply. One is that the developers simply have not done the work yet, but 
intend to. Another is that they have concerns about font size on mobile 
devices, which has delayed support for fonts with large layout tables. 
Another is that they have security concerns about OTL tables in fonts 
(Google's webfont sanitiser was stripping OTL tables from fonts served 
to Chrome for this reason, as I understand; I'm not sure if this has 
changed yet).


JH

Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-05 Thread John Hudson


Christopher Fynn wrote:


OpenType is an openly available specification for fonts which anyone
can use without paying a licence to adobe or microsoft who maintain
the specification


While Microsoft and Adobe maintain the OT specification, it should be 
noted that the OT spec is in synch with the OpenFont Format 
specification, which is an ISO spec under the MPEG standard. The two 
specifications are not formally required to remain in synch, but this 
appears to be everyone's intention, and both Microsoft and Adobe have 
been using the ISO standards process to develop and publish updates to 
the spec.


JH

Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-04 Thread John Hudson


Mahesh T. Pai wrote:


It is another matter that no font actually uses a non-opentype layout,
which basically requires putting the non-encoded glyphs in the
Private Use Area (PUA), and then call the glyphs by a name. 


PUA isn't necessary, and a font technology that handles elements of 
complex script shaping by referencing PUAs isn't fundamentally any 
different from one that uses glyph names or another identifier and 
leaves the glyph unencoded.


What OpenType does is to provide, in the font, some of the 
glyph-to-glyph mapping and positioning data that would otherwise have to 
reside elsewhere in the OS-app-font matrix.


JH

Re: RTL PUA?

2011-08-23 Thread John Hudson


Philippe Verdy verd...@wanadoo.fr wrote:


The computing order of features should not then be:
 - BiDi algorithm for reordering grapheme clusters
 - font search and font fallback (using cmap)
 - GSUB (lookups of ligatures or discretionary glyph variants)
 - GPOS



but really:



 - font lookup and font fallback (using cmap)
 - GSUB (lookups of ligatures or discretionary glyph variants)
 - BiDi algorithm for reordering glyphs representing the grapheme
clusters or ligatured grapheme clusters
 - GPOS


I can see the advantages of such an approach -- performing GSUB prior to 
BiDi would enable cross-directional contextual substitutions, which are 
currently impossible -- but the existing model in which BiDi is applied 
to characters *not glyphs* isn't likely to change. Switching from 
processing GSUB lookups in logical order rather than reading order would 
break too many things.


JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-23 Thread John Hudson


Behdad Esfahbod wrote:


I can see the advantages of such an approach -- performing GSUB prior to BiDi
would enable cross-directional contextual substitutions, which are currently
impossible -- but the existing model in which BiDi is applied to characters
*not glyphs* isn't likely to change. Switching from processing GSUB lookups in
logical order rather than reading order would break too many things.



You can't get cross-directional-run GSUB either way because  by definition
GSUB in an RTL run runs RTL, and GSUB in an LTR run runs LTR.  If you do it
before Bidi, you get, eg, kerning between two glyphs which end up being
reordered far apart from eachother.  You really want GSUB to be applied on the
visual glyph string, but which direction it runs is a different issue.


Kerning is GPOS, not GSUB.

But generally I agree. My point was that Philippe's suggestion, although 
it could be the basis of an alternative form of layout that might have 
some benefits if fully worked out, is a radical departure from how 
OpenType works.


J.


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-23 Thread John Hudson


Philippe Verdy wrote:


Rereading closely the OpenType spec...


I suggest you read also the script-specific OT layout specifications.

http://www.microsoft.com/typography/SpecificationsOverview.mspx

You'll note, for example, that the Arabic font spec doesn't even mention 
BiDi, because it is assumed that this has been resolved before glyph 
runs for OTL processing are even identified. This makes sense to me 
because BiDi is a character-centric operation.


The Microsoft font specs describe what Uniscribe (and DWrite) do with 
text and fonts for particular scripts, and there may be some differences 
in other implementations. For example, Uniscribe performs s invalid mark 
sequence checks that others, preferring to see this as a task for 
spellcheckers, do not. But the glyph selection and positioning results 
should be the same across implementations. Font makers need to know how 
text is processed and OTL features applied in order to make fonts that 
work with resulting glyph runs and input strings. Changing the point in 
the glyph string resolution when BiDi is applied breaks everything. It's 
a complete non-starter.


JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-23 Thread John Hudson

Philippe, I'll need to think about this some more and try to get a 
better grasp of what you're suggesting. But some immediate thoughts come 
to mind:


If BiDi is to be applied to shaped glyph strings, surely that means 
needing to step backwards through the processing that arrived at those 
shaped glyph strings in order to correctly identify their relationship 
to underlying character codes, since it is the characters, not the 
glyphs, that have directional properties. There's nothing in an OT font 
that says e.g. GID 456 /lam_alif.fina/ is an RTL glyph, so the 
directionality has to be processed at the character level and mapped up 
through the GSUB features to the glyphs.


I think you may be right that quite a lot of existing OTL functionality 
wouldn't be affected by applying BiDi after glyph shaping: logical order 
and resolved order are often identical in terms of GSUB input. But it is 
in the cases where they are not identical that there needs to be a 
clearly defined and standard way to do things on which font developers 
can rely. [A parallel is canonical combining class ordering and GPOS 
mark positioning: there are huge numbers of instances, even for quite 
complicated combinations of base plus multiple marks, in which it really 
doesn't matter what order the marks are in for the typeform to display 
correctly; but there are some instances in which you absolutely need to 
have a particular mark sequence.]


I've lost track of what the putative benefit of processing BiDi post 
glyph shaping is. I think I missed part of your earlier exchange with 
Behdad.



JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-22 Thread John Hudson


Shriramana Sharma wrote:

The font tables themselves contain only ASCII characters I presume. 


OpenType Layout tables use Glyph IDs. OTL development tools typically 
use glyph names, which may be particular to the tool or the same names 
used in the post or CFF tables.


OTL tables work on glyphs, not characters, and bidi will have been 
resolved prior to application of OTL substitution and positioning. Input 
glyph strings for substitution lookups are always in the resolved 
direction of the glyph run, so Arabic and Hebrew alphabetic runs are 
processed right-to-left, i.e.


alef lamed - alef_lamed

*not*

lamed alef - alef_lamed

Similarly, context stings for glyph positioning (if present) will be 
right-to-left, although anchor attachment positions on individual glyphs 
are relative to the 0,0 coordinate, i.e. the left sidebearing.


JH



--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-22 Thread John Hudson


Shriramana Sharma wrote:

I was just noting 
that the glyph tables themselves don't *use* the actual codepoints of 
the characters getting ligated (while they *refer* to them).


Characters are mapped to glyph IDs in the font cmap tables.

Glyph IDs are mapped to other glyph IDs (one-to-one, one-to-many, 
many-to-one, or one-to-one-of-many) in the layout GSUB table.


No! See Behdad's post -- it is clearly said that the lookup will still 
be in logical order (1001, 1012) - (1540) and not in visual order as 
you say.


I think there may be some confusion in this discussion over what 
constitutes 'visual order'. I try to avoid the term because it is 
difficult for right-to-left readers to accustom themselves to thinking 
of visual order as anything other than right-to-left. I prefer the term 
'reading order' or 'resolved order', i.e. resolved bidi and script 
shaping order, which may have involved integrated reordering (reordering 
within the glyph processing) as in the case of Indic scripts.


Nope -- they are placed in the lookup table in *logical* order. IIUC the 
entire sequence of glyphs is only reordered from RTL at the very end. 
Peter or Behdad, can you corroborate this?


Glyph ID inputs for OTL processing are according to reading/resolved 
order. This is typically the same as logical order, but the term logical 
order really applies to character strings, not glyph strings, which are 
much more maleable. The order of input strings in GSUB lookups or 
contexts is dependent not only on the underlying character order, but 
also on the results of previous GSUB lookups. So while, unlike AAT and 
Graphite, OpenType Layout doesn't explicitly provide for glyph 
re-ordering, some kinds of glyph reordering are possible using sequences 
of contextual lookups to duplicate a glyph in a second location in the 
string and then remove the first instance. We use this in some 
Devanagari fonts to enable subsequent ligation of short ikar variants to 
the left of a consonant base with reph marks to the right of that base.


JH



--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-21 Thread John Hudson


Jonathan Rosenne wrote:


People do all kinds of fancy things. I guess old manuscripts contain many
ligatures...


Not in Hebrew. The only common ligature is the aleph_lamed, a 
post-classical import from Judaeo-Arabic.


JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: RTL PUA?

2011-08-21 Thread John Hudson


Petr Tomasek wrote:

Not in Hebrew. The only common ligature is the aleph_lamed, a 
post-classical import from Judaeo-Arabic.



Not true. See:
Collete Sirat. Hebrew Manuscripts of the Middle Ages. Cambridge University 
Press 2002,
fig. 114 (p. 176) or fig. 127 (p. 189) or fig. 134 (p. 193).


I wouldn't classify any of those examples as 'common'. I also wouldn't 
classify all examples of touching letters -- of which many occur in 
rapidly written text -- as ligatures. Aleph+lamed on the other hand is a 
regularly occurring distinct formation in whole classes of manuscripts 
(and persisting in typography). I have a good collection of books on 
Hebrew palaeography, and while there are many examples of Hebrew letters 
being very tightly spaced there are relatively few instances of what I 
would consider ligatures, i.e. formations in which the ductus or spacing 
of the specific sequences of letters is modified to facilitate connection.


JH


--

Tiro Typeworkswww.tiro.com
Gulf Islands, BC  t...@tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
 - Sidney Harring, _Policing a Class Society_

Re: [OT?] Uniscribe for Malayalam and Oriya

2004-12-21 Thread John Hudson

Marco Cimarosti wrote:
Hallo everybody. I hope this is not (too) OT. In case it is, can somebody
please redirect me to a more appropriate forum?
Good places to ask this question are the OpenType list:
Subscribe: [EMAIL PROTECTED]
Unsubscribe: [EMAIL PROTECTED]
Set list to inactive: [EMAIL PROTECTED]
Set list to active: [EMAIL PROTECTED]
Message mode: [EMAIL PROTECTED]
Digest mode: [EMAIL PROTECTED]
Or the VOLT user community:
http://groups.msn.com/MicrosoftVOLTuserscommunity
A beta version of Uniscribe is available from the VOLT community site, but obviously 
should be used only for font testing. The most recent release version of Uniscribe is that 
which ships with Office 2003.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The peasant of the Garonne, by Jacques Maritain
The land of Ulro, by Czeslaw Milosz

Re: [hebrew] Re: proposals I wrote (and also, didn't write)

2004-12-09 Thread John Hudson

Mark E. Shoulson wrote:
(I suppose there could be a fuzzy line between those.  What do you say 
about a mark that always appears at the end of a word kinda 
over-and-to-the-left of the last letter?  Like, say, Zarqa in Masoretic 
Hebrew?  Is it a spacing character after the word or a mark on the 
letter?  In the case of Zarqa, it's clearly a combining mark on the 
letter, based on other accents, printing, and general perception through 
the years.  But in general?)
I don't know, I'm not a generalist :)
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: proposals I wrote (and also, didn't write)

2004-12-07 Thread John Hudson

E. Keown wrote:
In the so-called 'deprecated' block, the 2nd Hebrew
block in the BMP, are composed Hebrew points which I
plan to go on using.  And I expect everyone else to go
on using them also, all Hebraists.  We think they are
needed for 'text representation' of shin and sin.  
It really is a better idea to use the decomposed forms, and to allow text representation 
to be handled at the glyph level.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: OpenType not for Open Communication?

2004-12-07 Thread John Hudson

John Cowan wrote:
OpenType is a trademark of Microsoft and a proprietary font format 
jointly developed by Microsoft and Adobe. 

The question is, is it an open standard?  That is, is anyone free to
create OpenType fonts, OpenType font tools, OpenType font renderers?
Is the documentation freely available at no more than nominal cost?
Yes.
There are Apple patents that relate to TrueType renderers, but as far as I know that is 
the only IP, other than the usual trademark acknowledgements, that affect OpenType 
development and implementation.

MS and Adobe have both been incredibly helpful to font developers and font tool 
developers. This is one of the reasons for the formats relative success.

(Are there bespoke fonts which the buyer keeps to himself?)
Yes, but not usually because they *only* work for him. In ten years I've only made one or 
two (with non-standard 8-bit character sets) fonts that worked only in the private 
applications for which they were made.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: OpenType not for Open Communication?

2004-12-06 Thread John Hudson

Peter R. Mueller-Roemer wrote:
wITH 'it' you refer to OpenType ? So OpentType are Type-faces= fonts
that are only open by leaving technical details unrestricted to
font-designers, text-processing-software?
Then it's name is another MISNOMER (the word Open can't be made
proprietary by itself, so it is not illegal) that a lot of customers
MISUNDERSTAND, and thus it is MISLEADING and unfair to your customers.
OpenType is a trademark of Microsoft and a proprietary font format jointly developed by
Microsoft and Adobe. It was originally developed by MS as TrueType Open, and the name was
changed to OpenType after Adobe became involved and the format embraced PostScript outline
data. In both cases, the 'open' in the name refers to the fact that the format is
extensible in terms of the amount and kind of layout intelligence built into the font. It
is open compared to the earlier sfnt font format (TrueType).

'Open' in the OpenType name has never implied open in the sense of open source, a software
phenomenon that only really became big news after the development of OpenType. I suspect,
if the format were invented today, MS would have chosen some other name, since they are at
pains to diassociate themselves from much open source software.

The Unicode-Standard I hope is Open in the sense that any font that is
designed to this standard may call itself a unicode-font (complete or
partial ...).
Unicode is a text encoding standard. Fonts and other software implement the standard. The
'openness' of the standard doesn't imply anything about the 'openness' of the software.

Unicode has a great potential to remove the language-specific boundaries
from web-communication, but if allmost equivalent fonts ( SW to read,
write and print) are not freely available for private use, than its
accepance will not be so wide as is necessary to enable multi-lingual
communication!
Font developers are under no obligation to provide you with free fonts. Do you not charge
for your work? If you want fonts to be freely available, you have to find some way to pay
for their development, e.g. the model of the SBL Font Foundation, which is raising funds
from partner organisations to pay for free fonts for Biblical scholarship.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain Jean Cocteau
Difficulites, by Ronald Knox Arnold Lunn

Re: [hebrew] Re: proposals I wrote (and also, didn't write)

2004-12-06 Thread John Hudson

Mark E. Shoulson wrote:
I don't know.  I try to avoid politics, if possible.  The significance 
of what I'm saying is that you have made a good start in your proposal, 
that it has some shortcomings, and that I hope to be able to help put 
something more complete together.
It would be great if there were eventually a proposal, based on all your contributions, to 
which you would all be happy to attach your names, and which would be recommended to the 
UTC by all interested parties.

I'll have to look closely at samples again, but it seems to me that the 
accent marks are not pointing and thus not combining marks (though the 
vowel points of course are combining marks).  They appear to be used 
more as punctuation than as letter-diacriticals.
Do you mean that they are spacing characters?
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: OpenType vs TrueType (was current version of unicode-font)

2004-12-04 Thread John Hudson

Philippe Verdy wrote:
What is strange also is that the www.opentype.org web site is a page 
whose title refers to Arial Unicode MS. Isn't it a Microsoft font? 
These things all combined are very intrigating.
Arial is a Monotype face: design, copyright, trademark. Always has been. Arial Unicode MS 
is one font in the Arial family, made for MS but still Monotype's IP.

Is there a way outside OpenType for other system vendors than Microsoft 
and Apple? This standard loks more and more proprietary...
It has always been a proprietary font format. It has never been anything 
but proprietary.
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: OpenType vs TrueType (was current version of unicode-font)

2004-12-04 Thread John Hudson

Philippe Verdy wrote:
What is strange also is that the www.opentype.org web site is a page 
whose title refers to Arial Unicode MS. Isn't it a Microsoft font? 
These things all combined are very intrigating.
Arial is a Monotype face: design, copyright, trademark. Always has been. Arial Unicode MS 
is one font in the Arial family, made for MS but still Monotype's IP.

Is there a way outside OpenType for other system vendors than Microsoft 
and Apple? This standard loks more and more proprietary...
It has always been a proprietary font format. It has never been anything but proprietary. 
It doesn't claim to be a 'standard': it is a font format that happens to be more widely 
supported than other font formats.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: OpenType vs TrueType (was current version of unicode-font)

2004-12-04 Thread John Hudson

Antoine Leca wrote:
On Behalf Of Christopher Fynn
If a Windows application needs to properly display Unicode text for
languages such as Hindi, Tamil, Bengali, Nepali, Sinhala, Arabic,
Urdu and so on then it probably needs to support OpenType GSUB and
GPOS lookups.

Not just probably.

Well, there are other rendering technologies than Uniscribe; and some of
them even succeed at displaying complex scripts...
For a contrived yet verifiable (OpenSource) example, let have a look at Eric
Mader's LayoutEngine (in ICU) using Apple (GX) fonts with a Unicode cmap.
And yes I am talking of something that can run on Windows.
I think Peter's point was that complex script require font layout tables (note that he did 
not mention Uniscribe, which is an MS text engine that is a *client* of OpenType fonts and 
the OpenType Layout Services library), whereas Chris had suggested that they 'probably 
need' them. An Apple AAT (GX) font also includes layout tables, although using a different 
approach than OT.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: No Invisible Character - NBSP at the start of a word

2004-12-04 Thread John Hudson

Mark E. Shoulson wrote:
That said, I have nothing against using NBSP and various other tricks
and winding up supporting this. Even the INVISIBLE LETTER might make
sense in some settings (e.g. where you have something to be drawn in
later but the diacritic is printed now, for some reason). Just that I
don't considere qere/ketiv per se a very convincing argument in a
plain-text domain.
For many plain-text purposes (searching, sorting, comparison, etc.) I would agree with
you, and would expect the qere and ketiv to be separately encoded. But the fact remains
that there is *also* a need to render the merged forms, to display them and to print them.
This means that there need to be font mechanisms to arrange base and combining mark
glyphs, and for better or for worse such font mechanisms interact directly with character
strings, not with 'markup'. The only available input for glyph processing is glyphs, and
the first level of input is via the font cmap table from the character string of the plain
text. The famous 'higher level protocols' that are supposed to look after rendering are
built on top of the plain text: they are not separated from that text.

By all means recommend that for most purposes ketiv and qere should be separately encoded:
there are lots of good reasons to do so. But don't ignore the need to correctly display
the merged forms, which is a textual problem requiring a solution that is at least in part
character-level.

Re: OpenType vs TrueType (was current version of unicode-font)

2004-12-03 Thread John Hudson

Gary P. Grosso wrote:
First, I see an O icon, not an OT icon in Windows' Fonts folder for some fonts and a TT icon for others. Nothing looks like OT to me, so are we talking about the same thing?

Next, if I double-click on one of the fonts (files), I get a window which shows a sample of the font, at the top of which is the font name, followed by either (OpenType) or (TrueType). Can I believe what that says as indicative of whether this is truly OpenType or TrueType?
An OpenType font can contain either TrueType or Postscript (CFF) outlines and hints. A CFF
flavour OT font, which will have the .otf extension, gets the O icon automatically, as
determined by the presence of a 'CFF' table in the font. A TT flavour OT font may have
either the .ttf or .otf extension (more likely the former, for backwards compatibility
reasons), but will only get the O icon if the font contains a digital signature 'dsig'
table. The reason for this is that the 'dsig' table is the only (optional) TT applicable
table that was added to the OT spec that was not already part of the TT spec.

The confusion for users is that the icon does not actually tell you anything very
interesting or useful about the font, because what one really wants to know is whether the
font contains OpenType Layout feature tables for glyph substitution and positioning. There
was some talk at MS of changing the icon system in Longhorn, so that the O icon would
reflect the presence of OTL tables in the font, but I don't know whether this will
actually happen.

Mostly how this comes up is we have customers ask if we support OpenType fonts, to which I reply with some variation of it depends. I usually say the OpenType spec is complex, but we handle all the commonly-used fonts we know of, and follow it by saying that they can look in their Fonts folder (at the icon) to see some examples of OpenType fonts. So that is the background for my questions.
The issue of supporting OT fonts is complex because it can mean several different things.
The OpenType file format is very widely supported (i.e. installable). Both outline and
hint flavours are widely supported (i.e. rasterised). The OpenType Layout tables and
features -- the stuff that most users think of when they hear 'OpenType', is supported to
different levels in different system and application mixes, and is also likely to enjoy
more support for some writing systems than others.

Re: OpenType vs TrueType (was current version of unicode-font)

2004-12-03 Thread John Hudson

Philippe Verdy wrote:
However the OpenType web site is apparently fixed only to this 
presentation page, with a single link to MonoType Corporation, not to 
the previous documentation hosted by Microsoft.

Is Microsoft stopping supporting OpenType, and about to sell the 
technology to the MonoType font foundry?
No. Monotype grabbed the OpenType domain name (and a lot of other type 
related domains).
The OpenType specification and other documentation is all at the MS Typography 
site:
http://www.microsoft.com/typography
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: Relationship between Unicode and 10646

2004-11-30 Thread John Hudson

Sarasvati wrote:
All discussions of Phoenician on this list have been declared closed.
Philippe and others, please discuss Phoenician elsewhere, and refrain
from replying specificaly to Phoenician issues here.
Point of information:
Marc Wilhelm Kster has set up an e-mail list for the discussion of Phoenician, so that 
those who are interested may discuss the encoding and implementation issues without 
deluging those who are less interested. One may subscribe to this list by sending an 
e-mail to:

[EMAIL PROTECTED]
John Hudson (who is not subscribed to that list)
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: No Invisible Character - NBSP at the start of a word

2004-11-27 Thread John Hudson

Jony Rosenne wrote:
Jony, what do you think plain text is? Why should the
arrangement of text on a page as a
marginal note be considered any differently from text
anywhere else *in its encoding*? Are
you suggesting that Unicode is only relevant to ... what?
totally unformatted text in a
text editor?

Basically, yes. Except for the control codes in Unicode - spaces, line feed,
carriage return, etc.

To indicate formatting one uses markup.
And markup is applied to what? Obviously, to text.
It seems to me that the primary purpose of the plain text limitation in Unicode is to
maintain the character/glyph distinction, so that it is clearly unnecessary to encode
display entities such as variant glyphs, ligatures, etc. separately from the underlying
character codes that they visibly represent in various ways. On this basis, I think there
is a sound argument to be made against encoding an 'invisible letter', if there is an
existing characters -- such as NBSP -- that logically and effectively serves the same
purpose in encoding a particular piece of text. But it *is* a piece of text, however
malformed it might seem from normal lexicographic understanding. It may not be a word. It
may, in fact, be two words merged into a unit. But it is most certainly text.

The idea that the position of such text on a page -- as a marginal note -- somehow demotes
it from being text, is particularly nonsensical.

But I'm now, as always, happy to hear alternate suggestions as to how things might be
handled in either encoding or display. So if you think merged Ketiv/Qere forms should be
handled by markup, perhaps you can explain how, so that I might better understand. Thank you.

Re: No Invisible Character - NBSP at the start of a word

2004-11-27 Thread John Hudson

Mark E. Shoulson wrote:
Well, that's the difference under discussion.  The plain text would 
seem to be either the qere or the ketiv (but not the combined blended 
form), since each of those is somewhat sensible. 
Is there some place in the standard where it says text must be sensible?
JH
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: No Invisible Character - NBSP at the start of a word

2004-11-26 Thread John Hudson

Jony Rosenne wrote:
One of the problems in this context is the phrase original meaning. What
we have is a juxtaposition of two words, which is indicated by writing the
letters of one with the vowels of the other. In many cases this does not
cause much of a problem, because the vowels fit the letters, but sometimes
they do not. Except for the most frequent cases, there normally is a note in
the margin with the alternate letters - I hope everyone agrees that notes in
the margin are not plain text.
Jony, what do you think plain text is? Why should the arrangement of text on a page as a 
marginal note be considered any differently from text anywhere else *in its encoding*? Are 
you suggesting that Unicode is only relevant to ... what? totally unformatted text in a 
text editor?

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread John Hudson

Jony Rosenne wrote:
Ketiv and Qere, were two different words are written together, are not plain
text and are thus out of scope for Unicode. 
Writing them in a combined way results in some sequences of characters that are very 
problematic from a rendering perspective, but there is a long standing tradition of 
writing them in combination. Saying that people should cease writing them as they have 
been written, and write them only separately doesn't seem to me to be much of a solution.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread John Hudson

Jony Rosenne wrote:
This isn't what I said. I said it isn't a Unicode problem because it isn't
plain text. 
And I don't understand how you are making this distinction between writing two words 
separately being plain text and combining them being not plain text. In what way is it not 
plain text? Why couldn't it or shouldn't it be plain text?

If I write, using Latin letters, 'YaHoWaiH' as a combination of the words YHWH and adonai, 
for instance, in what way is the former not just as much plain text as the latter? What 
makes the case fundamentally different for Hebrew?

I'm just trying to understand the basis of your insistence that traditional Ketiv/Qere 
combinations are not plain text. I can understand how and why one might implement them not 
as plain text, but this is not the same as determining, a priori, that they are not and 
cannot be plain text.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Peasant of the Garonne, by Jacques Maritain
Art and faith, by Jacques Maritain  Jean Cocteau
Difficulites, by Ronald Knox  Arnold Lunn

Re: [A12n-Collab] Latin alpha (Re: Public Review Issues Update)

2004-08-30 Thread John Hudson

Donald Z. Osborn wrote:
According to data from R. Hartell (1993), the latin alpha is used in Fe'efe'e (a
dialect of Bamileke) in Cameroon. See
http://www.bisharat.net/A12N/CAM-table.htm (full ref. there; Hartell names her
sources in her book). Not sure offhand of other uses, but I thought it was
proposed for Latin transcription of Tamashek in Mali at one point (I'll try to
check later). In any event it would seem easy to confuse the latin alpha with
the standard a, which would seem to either require exaggerated forms (of the
alpha, to clarify the difference) or limit its usefulness in practice.
The Latin alpha is usually distinguished from the regular Latin lowercase a by making the 
latter a 'double-storey' form, whereas the alpha is a single-storey form. Of course, this 
means that the distinction cannot be adequately made in typefaces with a single-storey 
lowercase a, such as Futura.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Mass in slow motion, by Ronald Knox
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Cambodian System of Writing (textbook) now available to download as a free etext

2004-06-16 Thread John Hudson

Peter Constable wrote:
Cambodian System of Writing and Beginning Reader with Drills and Glossary
Franklin E. Huffman, with assistance from Chhom-Rak Thong Lambert  Im 
Proum.

http://pratyeka.org/csw/
This text is also in print again, for those those who still like their books on 
pulped tree:
http://www.amazon.com/exec/obidos/tg/detail/-/0300013140
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Seven Storey Mountain, by Thomas Merton
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: proposal for a creative commons character

2004-06-15 Thread John Hudson

The Creative Commons (http://www.creativecommons.org) is devoted to
expanding the range of creative work available for others to build upon
and share.  While technically they use copyright to do so, the creative
commons has this neat symbol that looks just like a copyright (00A9)
except that it has two letter c's inside the circle instead of one. 
Thus, it looks like (cc) instead of (c).  There are some other symbols
they have also created which can be seen on this page:
http://creativecommons.org/license/

Without getting greedy, I'd like to propose the adoption of the (cc)
symbol in whatever way would be most expedient (so that creative commons
authors can identify their work more appropriately), and leave for later
the question of the other symbols.
Well, I have a logo too and it sure would be swell to be able to 'identify my work more 
appropriately' in plain text. But Unicode does not encode logos or other idiosyncratic marks.

We has the same discussion a couple of years ago with the 'Copyleft' people, who wanted 
their own open source collaborative effort's logo encoded. Maybe if that had happened we 
could now have a fun argument about whether or not the Creative Commons logo is a glyph 
variant of the Copyleft logo. :)

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Seven Storey Mountain, by Thomas Merton
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: proposal for a creative commons character

2004-06-15 Thread John Hudson

1.  The Euro symbol is a logo of the new European currency.

Yes, but it is not _just_ a logo. It is a logo which found its way into
plain text. It is quite usual for a plain text to  use the euro logo instead
of the EUR currency abbreviation.
I wouldn't even use the term 'logo' for the euro symbol. It is a currency symbol just like 
the $ sign. The fact that it was invented by a committee and didn't develop organically 
over time does not make it a logo, and it has very quickly developed all the 
characteristics of other currency symbols, including great variation of form and 
typographic representation. Furthermore, it is a symbol specified by, recognised by, and 
encoded by national standards bodies. Unsurprisingly, if a government comes along and says 
'We have this legal symbol that means X and we have a need to use it in plain text', that 
symbol tends to get encoded.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
The Seven Storey Mountain, by Thomas Merton
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread John Hudson

[EMAIL PROTECTED] wrote:
Even with OpenType experimental support here, my display looks like
the GIF you sent.  I'll try fixing this, 

Um, good luck. I am not sure it is possible to correctly position 
double-diacritics with OpenType logic. Specifically, the vertical position 
of the double-diacritic must be adjusted so that it is above the *taller* 
of the preceding and following combining sequence. AFAIK, such logic isn't 
feasible in OpenType.
You could handle it fairly easily by contextually substituting a glyph variant of the 
double-diacritic at a different height.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Archaic Greek letter like palm tree?

2004-06-05 Thread John Hudson

E. Keown wrote:
I'm looking for an archaic Greek letter (from Crete
and possibly elsewhere) which to me looks like a small
drawing of a palm tree.  It has a trunk and two
identical palm frond branches, one to the right, one
to the leftis this in Unicode, in process, ?
That sounds suspiciously like one of the recognised forms of the Ypsilon in 
non-archaic Greek.
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Response to Everson Phoenician and why June 7?

2004-06-02 Thread John Hudson

Ted Hopp wrote:
Let me rephrase the point as a question:
What in the encoding of 'Phoenician' characters in Unicode
obliges anyone to use those characters for ancient Canaanite
texts?

An analogous statement can be made of any script in Unicode. We can all
continue to use code pages or the myriad Hebrew fonts that put the glyphs at
Latin-0 code points. If the proposed Phoenician block can be so easily
ignored in encoding ancient Canaanite texts, then is the block really
needed?
Ironic to find myself arguing the other side of this debate, having been broadly
sympathetic to the semiticist objections to the proposal, but here goes...

Note that I was not ever suggesting using myriad codepages, font hacks or other methods to
encode ancient Canaanite texts. My point was that *within Unicode* one would have an
option whether to encode these texts using the Hebrew characters or 'Phoenician'
characters. The option, of course, may be a source of confusion, as choices often are. But
my point is that no one is forced to choose one or the other.

There are people who do not want to distinguish the encoding of ancient Canaanite from
square Aramaic. But there are also people who do want to distinguish them. Both groups of
people include respected scholars and experts in their fields.

Somehow (how?) forcing the former group of people to use Phoenician characters for their
texts would make them unhappy.

Not separately encoding 'Phoenician' characters, so that there was no way to distinguish
in plain text, would make the latter group of people unhappy.

What was insincere about my posting? Forgive me, but it seemed to me that
when you claim that Semiticists will be able to ignore the Phoenician block,
there is an implication that they will use something else. I never said that
they would have to ignore Unicode altogether, but they will have to develop
their own standards (agreements, if you prefer) for what that something
else will be.
But the whole basis of the discussion to that point had been that some semiticists wanted
to use the existing Hebrew block. The 'something else' is Hebrew, already encoded in
Unicode and supported my much existing software. As far as I could tell, no one was
suggesting developing some 'new standard'.

This frames the discussion in a way that ignores the coercive power of
Unicode in the marketplace.

One could, with only a little imagination, foresee that there will be
software packages that will only display Palaeo-Hebrew fonts for text
encoded in the 'Phoenician' block...
This frames the discussion in a way that ignores basic concepts of font and software
interaction. A software package has no way of knowing whether the glyph encoded at U+05D4
is Aramaic square script, stam, rashi, modern cursive or palaeo-Hebrew. If your *text* is
encoded using Hebrew characters, you can display it in any font that supports those
characters, regardless of the glyph shape mapped to those characters in the font. If your
text is encoded using Phoenician characters, the same applies: any font that supports
those characters can be used.

Moreover, if anyone wanted to use Phoenician in some future http protocol,
Unicode conformance is required (at least so says the standard).
What does that have to do with how semiticists decide to encode *texts*? If you want to
encode Palaeo-Hebrew texts using Hebrew characters, you are going to have a Hebrew
document. Phoenician is only relevant at all if you decide to use Phoenician characters
and produce a Phoenician document. This is what I mean when I say there is no reason not
to ignore the Phoenician characters if they do not suit your purpose.

Now, all that said, I still remain concerned that the people who want to distinguish
'Phoenician' from Aramaic square script and other Hebrew script styles in plain text have
not thought through the larger implications of encoding 'significant' nodes from a script
continuum. Encoding a single 'Ancient Near-Eastern 22-letter Alphabet', whether you're one
of the people who wants to use it or now, doesn't strike me as a significant problem.
Encoding half a dozen of these 'nodes' might be, because with each additional structurally
identical script the number of choices and likely confusion increase.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Definition of Script etc.

2004-05-31 Thread John Hudson

Adam Twardoch wrote:
Very recently at the Polish TeX Users' Group meeting prof. Janusz Bien
suggested that the only viable definition of the Unicode character is:
Character is a primitive term that is defined by enumeration.
So a script in Unicode is a labelled subset of enumerated entities. I like that :)
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Phoenician Kharoh proposals

2004-05-29 Thread John Hudson

Christopher Fynn wrote:
I find it interesting to compare the furore over the Phoenician
proposal with the total calm over the Kharoh proposal [N2732] - an
archaic script in which some Sanskrit and Sanskritized Gndhr texts
occur.

Couldn't the same arguments the Semiticists who would unify Phoenician
with Hebrew are making be just as easily made by Sanskritists to say that
Kharoh should be unified with Devanagri? After all ancient Sanskrit
texts in whatever script are traditionally written and published in
Devanagri or Latin transliteration by scholars that deal with them,
just as it is claimed that Phoenician texts are written and published in
modern Hebrew characters or transliteration by scholars.
I don't think this is a helpful comparison, Chris. The point has been made several times
is that the Phoenician/Palaeo-Hebrew/Hebrew issue not about transliteration but about how
semiticists -- for a long time -- have viewed the ancient semitic writing system, i.e. as
a script continuum, not as separate scripts. It is clear from the discussions that some
semticists consider this to be more fundamentally important than others, and think this
view should be reflected in the encoding, while others may share the view but not agree
that it necessarily be reflected in the encoding, and still others might share the view
but have a desire or need to distinguish parts of the continuum in plain text. Comparisons
to other writing systems -- whether they be fraktur or Kharoh -- are not helpful
because they do not necessarily share a comparable 'view' of the identity of the writing
systems. I have been thinking today that part of the reason for the debate is that Unicode
has a singular concept of 'script', a bucket into which variously shaped concepts of
writing systems must be put or rejected. I don't think there is anything conceptually
wrong with the idea that specific instances of a single script might be separately encoded
if there is a need or desire to distinguish them in plain text. It just happens that
Unicode has only one word that can be applied to such instances, and that is 'script'. It
seems clear to me now that what Unicode calls a script needn't necessarily be what
semiticists, or anyone else, calls a script. A functional Unicode definition of script
might be formed as: a finite collection of characters that can be distinguished in plain
text from other collections of characters.

There are very real issues of software implementation, font development, collation, text
indexing and searching, etc. that arise from encoding multiple instances of what some
users consider a single script, whether users in general opt to make the distinction in
plain text or not, by using the separate character collections or unifying text in a
single character collection and making the distinction at a higher level. I'm beginning to
think that our time would be better spent thinking about those issues.

Re: [BULK] - Re: Phoenician, Fraktur etc

2004-05-28 Thread John Hudson

Mike Ayers wrote:
Ummm - let me get this right.  Some people who are using these 
characters tell us that they need to fundamentally distinguish them from 
Hebrew characters, but that's not a good case.
As Ken pointed out, what has been expressed is a *desire* to distinguish in plain text, 
i.e. some people *want* to do this. This keeps getting referred to, however, as a *need*. 
I've asked for clarification of this 'need' because I want to understand why someone would 
want this distinction. So far, all the responses have been hypothetical. I'd really like 
to see some real world situations arising from work that someone is doing with ancient 
semitic writing in which there is a need for plain-text distinction of two or more ancient 
semitic scripts.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: PH technical issues (was RE: Why Fraktur is irrelevant

2004-05-27 Thread John Hudson

Peter Constable wrote:
That question has been answered. So far, the responses to the answers
provided haven't been exactly deafening. Does nobody in the
pro-unification camp have any response? Is nobody willing to give
acknowledgement to the problems presented?
My response was to ask what need there was for plain-text in the circumstance in which the
plain-text distinction was identified as a 'need'. The only response I received back was
John Jenkin's 'What if someone were trying to read their Hebrew e-mail at an Internet cafe
that only had a Palaeo-Hebrew font installed?' hypothesis. Aside from the sheer
unlikeliness of this, this threw the discussion back on the legibility criterion, but I'd
really like to know what prompts other users of 'Phoenician' to *need* a separate
encoding. I understand that some people may *want* a separate encoding, but I have not yet
seen a *need*, i.e. something that requires a distinction in plain-text. And I would like
to see such a need clearly explained, with material examples, because then we could stop
arguing until Michael submits his next semitic script proposal.

The concern I have is not so much with the Phoenician encoding per se, but with the
encoding of 'significant nodes' -- to use Michael's phrase -- on a script continuum. While
this might make sense to scholars dealing with isolated atomic instances of that
continuum, it is not going to make sense to scholars dealing with the continuum as a
whole, for whom the structural identity of the 'diacripts' within the continuum is much
more important than their visual dissimilarity at specific places and times. There are
'technical issues' -- in the same sense that there are technical issues prompting some
people to want a separate Phoenician encoding, i.e. usage issues -- that arise in trying
to do scholarly work in a script continuum that is variously encoded as multiple scripts.
These issues may not be sufficient to overcome the conflicting 'needs' of other scholars,
but they should not be ignored on that basis. In particular, if Unicode encodes a number
of 'significant nodes' on the semtitic script continuum, how should the standard be used
to encode texts that fall between the nodes? This is an issue even if one accepts the
concept of nodes, i.e. of a linear continuum with clearly identifiable chronological or
cultural script instances. Dean has, convincingly I think, presented examples of
overlapping of use of such 'nodes' among ancient communities, making it harder to
distinguish them from within the continuum.

Re: PH technical issues (was RE: Why Fraktur is irrelevant

2004-05-27 Thread John Hudson

Peter Constable wrote:
But if one can only
point to cases of (say) documents from a given community containing 0
and .6, or 0 and .9, then it would seem that the nodes had some
conceptual validity within that community.
I don't doubt that nodes had 'conceptual validity' in the ancient community, but what was 
the concept? We subdivide the writing of the Latin script into historical and regional 
nodes (themselves open to dispute), but we recognise only a single script. I think Simon 
Montagu made the point regarding the 'conceptual validity' of the nodes very well today:

...the limited evidence seems to suggest that Palaeo-Hebrew
and Square Hebrew were viewed as font variants by Hebrew
speakers 2,000 years ago, and as separate scripts by Hebrew
speakers today.
Of course the term 'font variant' is anachronistic, but Simon's observation captures the 
essence of the shift from seeing two styles of the same semtitic script to seeing two 
different scripts. If nothing else, this should help us understand why any insistence on 
the identity of these scripts as being either obviously unified or obviously distinct is 
unlikely to get us anywhere. The identity very much depends on the perspective of the 
observer.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Response to Everson Phoenician and why June 7?

2004-05-26 Thread John Hudson

James Kass wrote:
Obviously Palaeo-Hebrew is a modern term; the concept is however a very
old one - just look at the Dead Sea scrolls, turn-of-the-era Jewish
coins, etc., where it is employed in an archaizing way.

My pocket change is depressingly modern.
That needn't be an obstacle to the argument going full circle yet again. Hebrew and 
Palaeo-Hebrew letters occur side-by-side on some modern Israeli coins also. See the 
photography near the bottom of this Typophile discussion:

http://www.typophile.com/forums/messages/4101/27209.html
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Classification; Phoenician

2004-05-25 Thread John Hudson

Christopher Fynn wrote:
*All* classification is arbitrary.

If script classification is arbitrary or nominal, isn't there is still a 
case for attempting some consistency or following a single model within 
a particular standard like the UCS? 
Indeed there is. If a single, one-size-fits-all model can't be described, at least a 
series of guidelines could be, i.e. guidelines formalised by the UTC not simply endorsed 
by one person who happens to have proposed a lot of stuff to be encoded. I have a huge 
amount of respect for Michael and his achievements, but I think something other than his 
opinion and endorsement is necessary to formally clarify the grounds for encoding or not 
encoding contentious writing systems. The absence of such guidelines promotes the kind of 
circular debate we've seen over Phoenician. Appeals to precedence should be considered, 
but I don't think they are convincing in themselves because everyone knows that precedence 
also involved an absence of specific guidelines and decisions that, arguably, could or 
should have been different. As noted previously, guidelines are more important for dealing 
with historical scripts than for living ones, since more contentious questions tend to be 
raised by such writing systems.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Proposal to encode dominoes and other game symbols

2004-05-25 Thread John Hudson

Andrew C. West wrote:
I've never quite worked out what purpose U+2616 [WHITE SHOGI PIECE] and U+2617
[BLACK SHOGI PIECE] are intended for.

The standard game of shogi (Japanese Chess) has 20 uncoloured tiles on each
side, with a kanji inscription giving the piece's name on each tile.
In discussions of shogi games, one player is conventionally called 'Black' and the other 
'White', but as you note this has nothing to do with the colour of the pieces. I would 
like to know what the presumed purpose of U+2616 and U+2617 is. If it is indeed to be able 
to represent shogi game pieces, then the glyph representation shown in the Unicode charts 
might be changed: both pieces should be white in colour, but facing in opposite directions.

Each side's
20 tiles are identical (differentiated by orientation not by colour) except for
the general.
Not so. Both sides has four generals: two 'gold' and two 'silver'. The gold and silver 
generals differ from each other, but each side's pieces are entirely identical.

By the way, if any Unicoders play shogi, I could bring my travel set next time I come to 
the conference.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Fraktur Legibility (was Re: Response to Everson Phoenician)

2004-05-25 Thread John Hudson

[EMAIL PROTECTED] wrote:
Dean Snyder scripsit:

So, you are saying there are glyph streams in German Fraktur that fluent,
native Germans would have trouble reading. 
This reminds me of a game played by scriptorium monks in the Middle Ages. The textura 
style of blackletter, especially when written in a compressed manner, consists of many 
identical or near-identical letter strokes forming key letters. Monks amused themselves by 
coming up with words and sentences made up entirely of as many such letters as possible. 
When written in a compressed textura hand with tight letterspacing such words and 
sentences become completely illegible. The following is may favourite example, although 
the l, o and t in the last word make it an impure sample:

mimi numinum nivium minimi munium nimium vini
muniminum imminui vivi minimum volunt
which roughly translates as:
The very short mimes of the snow gods do not wish
at all that the very great burden of distributing the
wine of the walls will be lightened in their lifetime.
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Response to Everson Phoenician and why June 7?

2004-05-24 Thread John Hudson

James Kass wrote:
Also, I'm having trouble understanding why Semitic scholars wouldn't
relish the ability to display modern and palaeo-Hebrew side-by-side
in the same plain text document.  
Because they want to search documents in the Hebrew *language* using Hebrew characters in 
search strings? Because they don't want to guess in what script variant an online corpus 
is encoded when doing searches? Because plain-text distinction of script variant text in 
the same language is just about the least important thing in their work? Because they have 
yet to see a good argument for why anyone would need to make such a distinction?

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Response to Everson Phoenician and why June 7?

2004-05-24 Thread John Hudson

Michael Everson wrote:
Why, James, we gave evidence a month ago that the ancient Hebrews 
considered it to be a different script than the one they had learned in 
exile.
To be fair, it isn't at all clear from your evidence that the Ancient Hebrews had the same 
concept of 'script' as the Unicode Standard. I don't recall anything in what you cited 
that suggested anything more significant than a recognition of a change in the style of 
writing *the same Hebrew letters*, or as they might have said, if they did use Unicode 
parlance, the same abstract characters.

The fact that they acknowledge that particular styles of writing are or are not 
appropriate for religious texts is neither surprising nor relevant, as the same 
distinctions are made between ktiva merubaat and stam.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Response to Everson Phoenician and why June 7?

2004-05-24 Thread John Hudson

saqqara wrote:
I'm genuinely interested in why Phoenician should not be regarded as a
separate script but have yet to read a reasoned response to earlier posts.
I think the view may be most succinctly expressed in this way:
  The numerous and visually varied 22-letter semitic writing
  systems all represent the same 22 abstract characters.
  The Unicode Standard encodes abstract characters.
  Ergo, only one set of codepoints is required to encode the
  22-letter semitic writing systems.
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Response to Everson Phoenician and why June 7?

2004-05-24 Thread John Hudson

James Kass wrote:
Because they want to search documents in the 
Hebrew *language* using Hebrew characters in 
search strings?

Because they don't want to guess 
in what script variant an online corpus is encoded 
when doing searches?

Guessing's not their job.  It's up to a sophisticated search
engine to find what users seek.  Some of us have tried to
dispel some of these fears by pointing out possible solutions.
Indeed, and I have made similar points to my semiticist and Biblical scholarship friends 
and correspondents regarding methods for working around the canonical combining class 
problems for Hebrew, and generally try to help people realise that the aspects of Unicode 
that seem to them 'broken' are not necessarily an impediment to getting work done. 
However, all this has left the understandable impression among many of these people that 
Unicode almost goes out of its way to make things difficult for people working with 
ancient Hebrew texts. Things that should be simple end up being complicated and require 
the development of sophisticated systems to perform simple tasks. Now the perception seems 
to be that in order to facilitate plain-text distinction of 'Phoenician' and Hebrew, yet 
more complexity and sophistication will be required to encode, search and study ancient 
texts. Frankly, I don't blame people for asking whether that distinction is worth the trouble.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Response to Everson Phoenician and why June 7?

2004-05-24 Thread John Hudson

Michael Everson wrote:
To be fair, it isn't at all clear from your evidence that the Ancient 
Hebrews had the same concept of 'script' as the Unicode Standard. I 
don't recall anything in what you cited that suggested anything more 
significant than a recognition of a change in the style of writing 
*the same Hebrew letters*, or as they might have said, if they did use 
Unicode parlance, the same abstract characters.

But we *do* and we have the history of the world's writing systems which 
lead *us* to consider these distinctions, in order to encode the world's 
writing systems in the Universal Character Set as more than a set of 
font variations on the alphabet.
No one is suggesting the latter. What is being suggested is that in considering the 
position of semitic scripts in the history of the world's writing systems the opinion of 
semitic scholars should not be secondary to that of generalist writers, most of whom have 
addressed ancient semitic scripts only from the perspective of their historically assumed 
contribution to Greek civilisation.

Classification is an arbitrary process in which one produces useful categories into which 
to arrange an otherwise unwieldy body of knowledge. The classification of scripts in the 
general history of the world's writing systems is useful for writing general histories of 
writing systems. It does not necessarily represent the truth.

Unicode also classifies scripts and seeks to do so in a way that is useful for text 
processing. This is well and good. What I have found problematic in your defence of the 
Phoenician proposal, Michael, is your assumption that the classification of script used in 
histories of writing systems naturally corresponds to the classification of scripts in 
Unicode, such that the fact that a number of books call something a script means that it 
should have a separate code block in Unicode. When non-generalists state that this 
historical classification is not useful for text processing purposes, and indeed that they 
disagree, from a specialist perspective, with that generalist history, they deserve better 
than 'Of course it is a separate script, I have a lot of books that say it is'.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Response to Everson Phoenician and why June 7?

2004-05-24 Thread John Hudson

Michael Everson wrote:
We have statements from real Semiticists who do not want their names 
dropped into this fray that they support the encoding of Phoenician as a 
separate and distinct script from Square Hebrew.
Are these statements going to be registered as documents? It would be nice to know what 
reasons are given.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Response to Everson Phoenician and why June 7?

2004-05-24 Thread John Hudson

Michael Everson wrote:
The numerous and visually varied 22-letter semitic writing
systems all represent the same 22 abstract characters.
The Unicode Standard encodes abstract characters.
Ergo, only one set of codepoints is required to encode the
22-letter semitic writing systems.

Oh, goody. Back to square 1.
To clarify: I was not positing this syllogism as a new argument, only seeking to express
as succinctly as possible the underlying logic of the opposition to the Phoenician
proposal. I don't think this logic is at all unreasonable, any more than I think many of
the arguments in favour of the proposal are unreasonable. This is why I don't think any
decision can be made on the basis of argument about the identity of 'scripts': there are
good arguments for and against different ways of encoding ancient Canaanite writing
systems. Yes, I think most of this debate has been a waste of time, but not because either
side is obviously right and the other wrong.

As stated previously, the only useful question to ask -- and the only sensible target for
those opposed to the proposal -- is whether there is really a 'need' for plain-text
distinction of 'Phoenician' from Hebrew and, presumably, from some other forms of ancient
Near Eastern writing. Patrick has, today, noted the existence of an inscription that
includes both Punic and Neo-Punic forms: is this a distinction that someone might have a
'need' to make in plain-text?

Re: MCW encoding of Hebrew (was RE: Response to Everson Ph and why Jun 7? fervor)

2004-05-24 Thread John Hudson

Peter Constable wrote:
I was not involved in those discussions so cannot comment on them. I 
just wish to point out that the MCW representation of Hebrew most 
certain *is* supported in Unicode: MCW uses ASCII Latin letters and 
punctuation characters to stand for Hebrew letters, vowel points and 
accents, and those exact same ASCII characters are encoded in Unicode. 
This was an 8-bit hack, the point which Elaine and other Biblical Hebrew scholars make is 
that MCW explicitly encodes distinctions between some marks, based on positioning, that 
the Unicode Hebrew block unifies. This means that while MCW text can be easily converted 
to Unicode Hebrew, it is not possible to round-trip such conversion in the same way that 
Unicode provides for pre-existing 8-bit standard character sets. One of the unfortunate 
aspects of this is that the ASCII-hack MCW encoding will likely remain the source encoding 
for many electronic Biblical Hebrew texts for some time to come, even if published texts 
are re-encoded as Unicode Hebrew, since MCW permits simple and unambiguous plain-text 
encoding of distinctions that are important to textual analysis. For example, although my 
clients at Libronic use Unicode encoding for their electronic BHS edition (because it 
provides greater interchangeability), they maintain an MCW encoded text as their master 
source. So much for the 'universal' character set...

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Classification; Phoenician

2004-05-24 Thread John Hudson

Michael Everson wrote:
Classification is an arbitrary process in which one produces useful
categories into which to arrange an otherwise unwieldy body of knowledge.

I dispute this. It is not arbitrary. Sometimes the cuts are difficult to
make, because there is messiness in the data, but classification puts
like with like and separates like from unlike. If it were arbitrary, we
would not be able to distinguish abugidas from syllabaries, or trace the
relationships between scripts and name the nodes on the tree.
*All* classification is arbitrary. This is a basic philosophical proposition. Note that
arbitrary does not mean baseless or capricious, it just means that systems of
classification are determined by the classifier, not by the thing classified. In 'putting
like with like and separating like from unlike' the classifier exercises judgement based
upon how he views the things being classified. This doesn't mean that the judgement is
capricious, but it is arbitrary because the same set of things could be classified in a
different way according to another, equally non-capricious set of criteria. You are making
the basic philosophical mistake of assuming a correlation of your classification (e.g.
abugidas vs. syllabaries) to description. As I said before, classification is a method of
managing and making sense of large bodies of knowledge; the system of classification
should not be confused with the body of knowledge itself. Classification is how we talk
about what we know; it is not what we know.

In 'The Analytical Language of John Wilkins', Jorge Luis Borges famously described a
system of classification of animals from a fictional Chinese encyclopaedia, in which
animals are divided into 14 classes:

1. those that belong to the Emperor,
2. embalmed ones,
3. those that are trained,
4. suckling pigs,
5. mermaids,
6. fabulous ones,
7. stray dogs,
8. those included in the present classification,
9. those that tremble as if they were mad,
10. innumerable ones,
11. those drawn with a very fine camelhair brush,
12. others,
13. those that have just broken a flower vase,
14. those that from a long way off look like flies.
This influential system of classification (Foucault said that, on reading it, all the
familiar landmarks of his thought were shattered) serves to demonstrate the philosophical
proposition stated above: systems of classification are arbitrary.

The classification of scripts in the general history of the world's
writing systems is useful for writing general histories of writing
systems. It does not necessarily represent the truth.

It is not just useful for general work. It has been, and will be,
useful in my own work in analyzing and encoding scripts.
As I would expect it to be, but be careful not to make that usefulness prescriptive. Just
because a writing system has been classified as a script by a particular author or authors
is not in itself grounds for encoding. The fact that many sources classify Phoenician as a
separate script is indeed important and informative, in exactly the same way that it is
important and informative to know that not everyone agrees with this classification or
finds it useful.

I don't make the determinations that I make randomly, John, nor do I
study the history of writing system for the pleasure of it, or to get
publish to get tenure at a university.
Again, arbitrary does not mean random. Of course you don't make determinations randomly.
You might, however, acknowledge that other ways of classifying writing systems may make
more sense to other people and that to them your determinations are far from as obvious as
you claim them to be. As I said very early in this discussion, encoding of historic
scripts is almost certainly going to be more likely to engender disagreement and debate
than the generally more obvious needs of modern scripts with neatly standardised
orthographies and character sets and properties. This is going to require fuller
justification of decisions, and the argument 'This is what we have done for other scripts'
isn't going to get us very far.

Whether Phoenician is encoded or not, I sincerely hope that the outcome of this process is
a better *mutual* understanding among the makers and users of Unicode.

Re: Proposal to encode dominoes and other game symbols

2004-05-24 Thread John Hudson

Michael Everson wrote:
Here. Chew on this. :-)
N2760
Proposal to encode dominoes and other game symbols
Michael Everson
2004-05-18
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2760.pdf
This could get out of hand very quickly. Chinese and Japanese (shogi) chess pieces? 
Chaturanga pieces? Is there really a *plain-text* need for this stuff? At what point is it 
more practical to say 'use a graphic'?

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Classification; Phoenician

2004-05-24 Thread John Hudson

Dean Snyder wrote:
 It simply doesn't make
sense to me that we should do different things for Semitic than we do 
for Indic.

Is it not a factor that the Indic scripts are in everyday use by living
communities?
Not all of them are. It is, however, a factor that the Indic scripts have varying shaping 
behaviour, not all of which is easily addressable at the glyph level. There is a net 
benefit to text processing and display in not unifying their encoding.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Response to Everson Phoenician and why June 7?

2004-05-21 Thread John Hudson

Ted Hopp wrote:
I don't think this expectation is unreasonable,
given their perception of the standard, and perhaps Unicode needs to do a
better job in conveying what the standard is and does and how it can be used.

With all due respect, this is disingenuous.
It was intended to be helpful.
It's like saying that building a
highway does not, per se, imply much of anything about how housing
development will take place. Perhaps true in a strictly literal sense, but
laughable in the real world.
It is not remotely like saying that.
What are users supposed to do? Ignore the Unicode support implemented in
commercial software and develop their own encoding tools? Ignore Unicode and
develop an interchange standard that meets their needs? Unicode decisions
have far-reaching consequences. Don't minimize the responsibilities that
come with power.
Let me rephrase the point as a question:
 What in the encoding of 'Phoenician' characters in Unicode
 obliges anyone to use those characters for ancient Canaanite
 texts?
I think it is you who is being disingenuous, because I never suggested that users should 
ignore Unicode altogether or that they should develop their own standard, or any of the 
other things you suggest follow in some way from my observation that there is no reason 
why semiticists should not ignore the Phoenician block. What aspect of 'Unicode support 
implemented in commercial software' would semiticists and other users have to ignore in 
order e.g. to encode Palaeo-Hebrew texts using the Hebrew block? None.

I happen to think that the Phoenician encoding is unnecessary, but the sky isn't going to 
fall if it gets accepted. There are lots of unnecessary things in Unicode -- the entire 
Arabic ligature set for example -- that intelligent people simply don't use. Now, it 
happens that there are apparently some people who claim to have a plain-text *need* to 
distinguish Phoenician from Hebrew, i.e. someone disagrees that it is unnecessary. As far 
as I'm concerned, this is the only basis on which the Michael's proposal should be 
accepted or rejected, which means that those who oppose the encoding would better spend 
their time querying that need directly to the people who have expressed it than making 
silly, repetetive arguments about fraktur on this list.

John Hudson

Re: Response to Everson Phoenician and why June 7?

2004-05-21 Thread John Hudson

Dean Snyder wrote:
those who oppose the encoding would better spend
their time querying that need directly to the people who have expressed
it than making
silly, repetetive arguments about fraktur on this list.

Silly, it is not; repetitive, only because the argument is apropos, has
never been countered, and the same, non-analogous arguments along these
lines are being brought up repetitively.
And is swaying no one, hence silly. Someone -- anyone remember who? -- once defined
stupidity as repeatedly doing the same thing while expecting a different result.

Dean, I happen to agree with many of the points you have made from your expert position,
i.e. regarding the historical uncertainty regarding the origins of the so-called
Phoenician script and its structural identity with Hebrew regardless of the entirely
superficial glyph variation. Having spent much of the past year and a half working with
semiticists and Biblical scholars, I've come to the conclusion that they know a heck of a
lot more about semitic writing systems than typical Eurocentric writers of generic texts
on the history and classification of writing systems. I think the expert comments of
semitic scholars should be taken very seriously in considering proposals to encode semitic
scripts, including objections to such proposals on grounds of script identity.

I do not think, however, that you are now achieving anything other than annoying people. I
am not objecting to what you hope to achieve, only pointing out that you are failing to
achieve it with your current strategy.

Re: Response to Everson Phoenician and why June 7?

2004-05-21 Thread John Hudson

Dean Snyder wrote:
What do you suggest I, or others, do other than have such discussions?
Target precisely and selectively.
Pay attention to what Ken Whistler writes, as he can generally be relied on to precisely
identify the basis on which a UTC decision might be expected. As he has noted, UTC
approval of a proposal to separately encode Phoenician is most likely to be based on an
expressed need to distinguish Hebrew and 'Phoenician' in plain text by a specific user
community or communities. As he has also noted, expression of an absence of need from a
user community does not constitute grounds for ignoring the expression of need from one or
more other communities. Get it? The fact that semtiticists do not need and many,
apparently, do not want a separate encoding does not override a need expressed by someone
else. The fact that *I* do not need and don't particularly want such an encoding does not
override a need expressed by someone else. It should follow from this observation that
*any* argument based on stating, restating or otherwise asserting semiticists' lack of
need for such an encoding is futile *if* reasonable need is expressed by other users.

So there you have your selecte and precise target: the need for plain-text distinction of
Hebrew and Phoenician as expressed by other user communities. This is the only target that
it is worth attacking, because it is the only target that offers the possibility of
victory. You need to identify the user communities that believe they have a need to make a
plain-text distinction, and you need to convince them that they don't really need it after
all.

Be prepared, however, for the possibility that the expressed need may turn out to be
legitimate.

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread John Hudson

[EMAIL PROTECTED] wrote:
In order for Phoenician to be disunified from Hebrew, it must
first have been unified with Hebrew.  This is not the case.
Okay, un-unified, non-unified, kept-separate-from ... pick your term.
At the moment Phoenician is neither unified nor non-unified with Hebrew *because no 
decision has been made by the UTC*. Lack of a decision neither implies unification or 
non-unification. Phoenician is in the box with Schroedinger's cat.

John Hudson

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread John Hudson

Peter Kirk wrote:
This is not a practical use of variation sequences if, by this, you
mean use of variation selectors. What are you going to do, add a
variation selector after every single base character in the text? ...
...
... Are you expecting fonts to support the tiny stylistic variations
between Phoenician, Moabite, Palaeo-Hebrew, etc. -- variations that
are not even cleanly defined by language usage -- with such sequences?

No one has suggested this.
Then what is Ernest suggesting? He wrote that the distinction between stylistic variants
of unified scripts could be done with variation sequences, i.e. a sequence that 'always
consists of a base character followed by the variation selector, may be specified as part
of the Unicode Standard'. He then went further and wrote:

My point was that I have seen enough evidence to
absolutely convince me that if both glyph repertoires
are unified in a single script, variation sequences
would be *necessary*. [My emphasis.]
So what is he suggesting if not that every single base character in a text would be
followed by a variation selector character in order to make a plain-text distinction
between stylistic variations?

Why not change the friggin' font? Why not use something other than plain-text?
The solution may be a catch-all, but the problem is a real one. Dr
Kaufman's response makes it clear that to professionals in the field
Everson's proposal is not just questionable but ridiculous. There is
certainly some PR work to be done in this area, not name-calling.
Peter, are we talking about the same thing? Ernest is suggesting bizarre measures to deal
with a problem -- in my opinion, a non-existent one -- that he sees in *unification*. You
are arguing against Michael's *dis-unification*. The ridiculousness of Ernest's suggestion
to use variation selector sequences -- indeed, perhaps he intends it to be ridiculous to
make a point -- is an argument in favour of dis-unification, since the alternative for
making a plain-text distinction is so daft.

My question, again, is whether there is a need for the plain text distinction in the first
place?

John Hudson

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread John Hudson

Kenneth Whistler wrote:
A character encoding standard is an engineering construct,
not a revelation of truth
Amen.
I begin to suspect that part of the problem -- the problem of interminable debate, not any
technical problem -- is due in part to different perceptions of the Unicode Standard. It
must seem pretty obvious to engineers that this is a standard for encoding characters and
that implementing support for the standard does not, per se, imply much of anything about
how users should encode text. This is perhaps less obvious to non-engineers -- i.e. to
users --, and understandably so given the typical representation of Unicode to this
audience: 'Now supports all the world's major living languages!'. It is evident from the
Phoenician discussion that a good number of people -- intelligent people, and experts in
particular fields -- expect UTC decisions on what characters to encode to influence user
decisions on how to encode specific texts. I don't think this expectation is unreasonable,
given their perception of the standard, and perhaps Unicode needs to do a better job in
conveying what the standard is and does and how it can be used.

There remains, in the Phoenician debate, much fuss about Unicode disunifying what a
particular set of people consider to be the same thing. Perhaps the point needs to be made
more strongly that for practical text processing purposes *unification or disunification
of Phoenician and Palaeo-Hebrew happens only at the point of encoding a particular text*.
There is no reason at all why Semiticists cannot simply totally ignore the proposed
Phoenician block. The important question then, it seems to me, is not whether to encode
Phoenician or not, but how to better communicate that the encoding of a particular set of
characters does not mean that they have to be used to encode particular texts or languages.

John Hudson

Re: Response to Everson Phoenician and why June 7?

2004-05-19 Thread John Hudson

Michael Everson wrote:
 There are already encodings
 suitable for all varieties of Northwest Semitic
 scripts.  One can legitimately argue, as some have,
 that there are still some problems with the Hebrew
 and Syriac encodings, but not that we need anything
 more for the other NW Semitic languages other than
 some nice FONTS!

Which would not address the plain-text requirement to distinguish the 
scripts qua scripts.
Michael, can you briefly outline the points regarding this 'requirement'? The only one 
that has been repeatedly referred to in this too-long discussion is the Tetragrammaton 
usage; I'm not sure whether that constitutes a requirement for plain-text or not. What are 
the other points?

In discussions of whether to encode individual characters/glyphs -- and now, it seems, 
scripts/styles --, much seems to be made of whether there is a requirement to make a 
distinction in plain-text, while the question of whether there is a requirement to use 
plain-text in the first place gets asked less often.*

*Except by Jony, who is always encouraging us to use markup to make distinctions.
John Hudson

Re: Response to Everson Phoenician and why June 7?

2004-05-19 Thread John Hudson

Ernest Cline wrote:
I would be very surprised if there were such a cybercafe.  One
that had both a Hebrew-Phoenican and a Hebrew-Hebrew font
with the Hebrew-Phoenician as the default would be much easier
to believe as a possibility.  Still, it is a valid point.  I think that if
Phoenician were to be unified with Hebrew, it would probably
behoove Unicode to establish variation sequences for Phoenician.

Even with a separate Phoenician script, it might be a good idea
to provide variation sequences that could be used to identify
different script styles such as Paleo-Hebrew and Punic
in the plain text.
This is not a practical use of variation sequences if, by this, you mean use of variation 
selectors. What are you going to do, add a variation selector after every single base 
character in the text? Are you expecting fonts to support the tiny stylistic variations 
between Phoenician, Moabite, Palaeo-Hebrew, etc. -- variations that are not even cleanly 
defined by language usage -- with such sequences?

Some people seem keen on variation selectors in the same way that others are keen on PUA: 
as a catch-all solution to non-existent problems.

John Hudson

Re: Qamats Qatan (was Response to Everson Phoenician and why June 7?)

2004-05-19 Thread John Hudson

Jony Rosenne wrote:
*Except by Jony, who is always encouraging us to use markup 
to make distinctions.

I don't recall saying anything like this in this Phoenician discussion.
Acknowledged. My point was not about that discussion in particular, but about the generic 
question of to what degree plain-text is a requirement, regardless of what one wants to do 
within it. Your frequent refrain that distinctions of shape, for what you consider to be 
the same character (and note that I am not agreeing or disagreeing with any particular 
judgement), should be handled in 'mark-up' presupposes something other than plain-text in 
terms of displaying that distinction. You frequently remind us that there are distinctions 
that are useful to some people, desirable in some circumstances, but which do not 
constitute a *requirement* in plain-text. Fair enough. For this same reason, I don't 
automatically accept the argument, made by Michael earlier today, that 'There is a 
requirement for distinction for X in plain-text'.

On what basis do we decide that X is necessary in plain-text while Y should be done with 
mark-up or some other 'higher level protocol'?

John Hudson

Re: Middle stroke of U+042D

2004-05-16 Thread John Hudson

So, since normal Russians are unaware of the variation in the middle
stroke of U+042D, and since russian typographers consider it a purely
decorative item, why would Mongolians think otherwise?
Indeed, if their goal were to deviate from Russian typographic
tradition they wouldn't have adopted the Cyrillic script in the first
place, right?...
What's then the story behind the alternate glyph for U+042D and its
rationale in the SIL Doulos font as given by  the online document
Doulos SIL 4.0 Font Documentation.pdf?
Some national communities have definite preferences about the form of 
specific letters, and it is perfectly legitimate for a typeface to 
address these preferences with variant glyphs as appropriate to the 
overall design. The best known Cyrillic preference is probably that of 
Serbian, Montenegran and Macedonian communities for specific italic 
forms that differ considerably from the international norms established 
by typical Russian forms. The Mongolian preference refered to in the 
Doulos documentation is a little dubious, I think, because a) it 
concerns such a small detail and not a significant variation in 
letterform comparable to e.g. the Serbian italic forms, and b) unlike 
Serbian, Mongolian has only been written in Cyrillic for a short period 
of time and such variant preferences normally derive from long 
chirographic practice. Frankly, this Mongolian preference looks like the 
sort of thing that develops when a particular typeface in a particular 
style becomes recognised as the norm for writing a language, rather than 
as simply one stylistic possibility.

John Hudson

Re: OT [was TR35]

2004-05-12 Thread John Hudson

Jony Rosenne wrote:

Mozilla's main value is for non-Windows platforms.
And for people who are unimpressed by Outlook's security track record.

JH

--

Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Phoenician

2004-05-07 Thread John Hudson

Mark E. Shoulson wrote:

Obviously one can find experts on both sides of this debate.

Experts that need something should not be told You can't have it 
because we have other experts who don't like it.
Indeed not, but they might actually want to be told 'Other experts have raised these 
specific issues about document encoding: have you considered these? do you think they are 
a problem? do you still want what you said you wanted?'

John Hudson

--

Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

Re: Just if and where is the sense then?

2004-05-06 Thread John Hudson

Peter Kirk wrote:
Eudora does not currently support Unicode, but the very excellent and 
*free* Mozilla Thunderbird e-mail program does. See: 
http://www.mozilla.org/products/thunderbird/

But does it display the Oracle's Yoruba correctly? Mozilla 1.6 doesn't. 
Or maybe it's the default plain text font.
The original question was about Unicode support, not about specific rendering technology. 
Thunderbird supports Unicode text encoding. So far as my tests indicate, Thunderbird on 
Windows uses Uniscribe for text layout, so if you have the appropriate version of 
Uniscribe in the program directory and the appropriate OpenType fonts installed, you 
should be able to enjoy exactly the same text rendering in Thunderbird as in Word.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat

No new contribution :)

2004-05-05 Thread John Hudson

Mark E. Shoulson wrote:
I have a Tiqqun with pointed and cantillated STAM text, and it isn't the 
only one I've seen.
Cool. Can you give me the bibliographic information? Thanks.
I'm giving up on the Phoenecian / Not Phoenician debate: nothing new is being said now, 
and the arguments are getting less persuasive, not more. This is a pity, because I'd hoped 
that the debate might produce a really convincing argument to encode or not to encode. I 
do agree with John Cowan's late comment that strong justification is desirable when 
proposing historic scripts, not least so we can avoid this kind of bandwidth-hogging 
debate in future.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: Just if and where is the sense then?

2004-05-05 Thread John Hudson

R.C. Bakhuizen van den Brink [Rein] wrote:
How well does low-budget Eudora support Unicode?
Eudora does not currently support Unicode, but the very excellent and *free* Mozilla 
Thunderbird e-mail program does. See: http://www.mozilla.org/products/thunderbird/

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: Just if and where is the sense then?

2004-05-05 Thread John Hudson

R.C. Bakhuizen van den Brink [Rein] wrote:
how well Unicode compatible are these operating systems? 

Eudora may be obsolete, but have you got any idea how big the installed 
base of Eudora is?
So bug Qualcomm to add Unicode support to Eudora. The idea that an international standard 
should be changed in order to overcome the shortcoming of particular software developers 
is clearly backwards.

But I'm still not sure exactly what you are suggesting should be done.
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: Yoruba Keyboard

2004-05-05 Thread John Hudson

African Oracle wrote:
This mail is written with the Yoruba Keyboard that was rolled out yetserday.
Please just look at the issue raised earlier raised.

Looking at the above it is obvious that the acute on top of the e and o with
dot below is a bit too high almost to the point of looking like a cedilla
under E.
That is entirely dependent on the font, regardless of how the diacritic is encoded. Quite
simply, this is not an encoding issue.

In transit the acute and the grave could be removed by just putting the
cursor in between and because ther are combined in a way that is not
binding.

It even becomes a compounded problem during copying and pasting because the
accent occupy two cursor space. I still think with all these observations
something must be done.
Cursor positioning is something that happens at a higher level than character encoding,
and there are several different models for cursor positioning in combining character
sequences, implemented by different software developers and often varying according to script.

By the way, are you familiar with the A12n Collaboration project for African computing?
Encoding, fonts and keyboard layouts for African languages have been quite extensively
discussed on that mailing list, and there is a higher number of African participants than
on the Unicode list. For details, see http://www.bisharat.net/ and, for mailing list
subscription, http://lists.kabissa.org/mailman/listinfo/a12n-collaboration

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
- Charles Peguy

Re: New contribution

2004-05-04 Thread John Hudson

Michael Everson wrote:
No Georgian can read Nuskhuri without a key. I maintain that no Hebrew 
reader can read Phoenician without a key. I maintain that it is 
completely unacceptable to represent Yiddish text in a Phoenician font 
and have anyone recognize it at all.
But no one is going to do that. No one is talking about doing that. This is a complete 
irrelevancy.

JH
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: Nice to join this forum....

2004-05-04 Thread John Hudson

Michael Everson wrote:
This is no different from Welsh:
A B C CH D DD E F FF G NG
All of those are considered letters in the Welsh alphabet. They are 
all significant. But that doesn't mean that ch and dd get encoded 
as single entities. They write c + h and d + d.

In Yoruba, you treat gb as a letter. That is fine. But you encode it 
with g + b.
Isn't there something in the FAQ about this? We've been through the discussion of digraph 
(and trigraph and tetragraph) encoding several times, and generally confusion stems from 
not understanding that higher level protocols are expected to handle rendering and things 
like sorting and spellchecking.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-05-04 Thread John Hudson

Michael Everson wrote:
  Hebrew has the same 22 characters, with the same character properties.
And a baroque set of additional marks and signs, none of which apply to 
any of the Phoenician letterforms, EVER, in the history of typography, 
reading, and literature.
And a baroque set of additional marks and signs, none of which apply any of the STAM 
letterforms...

I'm not arguing against the 'Phoenician' proposal: I just don't find many of these 
arguments very convincing. The fact that one style of lettering sometimes has combining 
marks applied and another doesn't does not seem a compelling reason not to unify them.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-05-04 Thread John Hudson

Michael Everson wrote:
If you people, after all of this discussion, can think that it is 
possible to print a newspaper article in Hebrew language or Yiddish in 
Phoenician letters, then all I can say is that understanding of the 
fundamentals of script identity is at an all-time low. I'm really 
surprised.
I can't believe anyone is even talking about typesetting newspapers in Hebrew or 
'Phoenician' letters: this is a total irrelevancy. I wouldn't typeset a Russian newspaper 
in 'vyaz style letters, either, but that doesn't make it a separate script from Cyrillic. 
Treating particular letterforms as glyph variants of existing characters does not imply 
that these letterforms are suitable for any text that might be encoded with those 
characters. So far as I can tell, no one is arguing such nonsense.

The issue is not whether Palaeo-Hebrew letterforms are readable by modern Jews, or whether 
they may be used in religious texts -- and I note that you are not suggesting that STAM 
should be separately encoded, even though it is the *only* style approved for use in Torah 
scrolls --: the issue is how ancient texts should be encoded.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-05-04 Thread John Hudson

Mark Davis wrote:
The question for me is whether the scholarly representations of the Phoenician
would vary enough that in order to represent the palo-Hebrew (or the other
language/period variants), one would need to have font difference anyway. If so,
then it doesn't buy much to encode separately from Hebrew. If not, then it would
be reasonable to separate them.
Given the sophistication of today's font technology, I don't think the encoding question 
can be addressed in this way. Regardless of whether 'Phoenician' letterforms are 
separately encoded, it is perfectly easy to include glyphs for these and for typical 
Hebrew square script (or any of a number of other different Hebrew script styles) in a 
single font. If the 'Phoenician' forms are not separately encoded, they can still be 
accessed as glyph variants using a variety of different mechanisms. The question is 
whether the distinction is necessary in plain text.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: Nice to join this forum....

2004-05-04 Thread John Hudson

Philippe Verdy wrote:
A problem, however, is that many such forms are found in unstable
orthographies, and are difficult to document adequately for inclusion in
proposals.

This last argument should not be a limitation to encode them. After all they are
used for living languages in danger of extinction, and even if documents using
them are rare, encoding them would help preserving these languages and helping
the development of their litteracy.
You misunderstand me. I was not indicating the scarcity of documents (although that can
also be a problem), and I certainly wasn't suggesting that documentation problems should
impede encoding. I'm talking about unstable orthographies, such that the documents you may
have -- even as recent as thirty years ago -- do not necessarily reflect current usage in
the country in question. Some African countries have strong language standardisation
organisations, e.g. Ghana, but in others orthographies are being developed by individual
linguists and missionary translators, and there may be competing orthographies and
disagreement over which should be adopted as official. On the one hand, one can make the
argument that anything that is used or has been used in documents should be encoded --
which is also the approach I would favour --, but then you are likely to get African
governments asking 'Why did you encode that? We don't use that. It isn't official.' You
also get software developers coming along wanting to know what they need to support for a
given language, and you can't give them a clear answer because the orthographies are
unstable. Again, none of these factors prevent encoding of new characters, but it is a
good idea to be aware of the uncertainty in the writing of many African languages, and
prepared to respond to queries or objections regarding specific characters.

Re: New contribution

2004-05-04 Thread John Hudson

Michael Everson wrote:
No Georgian can read Nuskhuri without a key. I maintain that no 
Hebrew reader can read Phoenician without a key. I maintain that it 
is completely unacceptable to represent Yiddish text in a Phoenician 
font and have anyone recognize it at all.

But no one is going to do that. No one is talking about doing that. 
This is a complete irrelevancy.

No, it is not. If Phoenician letterforms are just a font variant of 
Square Hebrew then it is reasonable to assume that readers of Square 
Hebrew will accept them in various contexts. Such as newspaper articles, 
or advertising copy, or restaurant menus, or wedding invitations. THAT 
is font switching.

I consider this fundamental to script identification.
Okay, than I fundamentally disagree with you. Good to have that clear.
How do you distinguish those scripts that are rejected as 'ciphers' of other scripts from 
those which you want to encode, if 1:1 correspondence is not sufficient grounds for 
unification but visual dissimilarity is grounds for disunification?

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-05-04 Thread John Hudson

Christian Cooke wrote:
Surely a cipher is by definition after the event, i.e. there must be 
the parent script before the child. Does it not follow that, by John's 
reasoning, if one is no more than a cipher of the other then it is 
Hebrew that is the cipher and so the only way Phoenician and Hebrew can 
be unified (a suggestion you'll have to assume is suitably showered with 
smileys :-) is for the latter to be deprecated and the former encoded as 
the /real/ parent script?
The argument of at least some contributors to this discussion is that the Hebrew' block 
is misnamed. Even if one accepts that 'Phoenician' should be separately encoded, the 
Hebrew block should have been called 'Aramaic' :)

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-05-03 Thread John Hudson

[EMAIL PROTECTED] wrote:
That question misses being a 'leading question' slightly.  The easiest
answer for the respondent is No, as then no further explanation on
respondent's part is necessary.  Furthermore, if we are to believe
the allegations about these users, they are already performing this
reprehensible practice, and so have apparently surmounted any
objections they might have once held.
I wasn't suggesting that this should be the only question asked, but I thought that the 
questions apparently posed by Deborah Anderson to users (as reported by Dean on Saturday), 
while not being leading, are not designed to elicit specific objections. Simply asking 
people 'Are you in favour of encoding Phoenician?' doesn't suggest state what the 
alternative may be. Unless the respondent is particularly knowledgeable about character 
encoding, he is unlikely to consider whether his needs are better met with new Phoenician 
characters or existing Hebrew ones, especially if the possibility of using the latter is 
not presented to him.

That said, I am very glad that Ms Anderson's further questions encourage users to review 
the Phoenician proposal and to comment on its merits.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-05-03 Thread John Hudson

[EMAIL PROTECTED] wrote:
Omniglot's Cyrillic page shows O.C.S. (Cyrillic 10th century) and
Cyrillic (1918 version) in the same graphic.  It's real easy to see
the similarity which is the reason for Cyrillic unification.
I suppose I consider the matter this way: visual similarity recommends unification, but 
visual dissimilarity does not always recommend disunification.

I'm not convinced that there is a *need* for the Phoenician encoding, so the matter seems 
to fall back on whether the desire of some people to encode ancient Phoenician, Moabite, 
Hebrew, Aramaic etc. texts in this way is sufficient reason to add the new characters. 
Also, it is not clear to me how widespread this desire is, especially among the community 
of scholars who are most likely to be working with such texts.

Again, I'm not opposing the encoding of 'Phoenician' on principle, but I do think it is 
more complex than Michael's proposal presumes, and that more consultation with potential 
users is desirable. I think one of the questions asked should be, frankly:

Do you have any objections to encoding text in
the Phoenician / Old Canaanite letters using
existing 'Hebrew' characters? If so, what are
these objections?
John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: Nice to join this forum....

2004-05-03 Thread John Hudson

Philippe Verdy wrote:
I thought about missing African letters like barred-R, barred-W, etc... with
combining overlay diacritics (whose usage has been strongly discouraged within
Unicode).
May be a font could handle theses combinations gracefully with custom glyph
substitution rules similar to the automatic detection of ligatures. But may be
they should not if Unicode will, in fine encode these characters separately
without any canonical equivalence with the composed sequence.
Having spent weeks time researching African orthographies a few years ago, I'm inclined to 
think that such barred letters should be separately encoded: they constitute new Latin 
letters, not combinations of elements within orthographies such as base letters and 
combining marks. A problem, however, is that many such forms are found in unstable 
orthographies, and are difficult to document adequately for inclusion in proposals.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-05-03 Thread John Hudson

Michael Everson wrote:
Expense.  Complication.  Delays while the encoding gets into the Standard
and thence into popular operating systems, with all the accoutrements
such as keyboard software.

None of those are reasons to stop encoding historic scripts.
No one is suggesting that these are reasons to stop encoding historic scripts. They may, 
however, be taken into account when deciding whether or not to encode an historic script 
that at least some people consider to be already encoded.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-05-02 Thread John Hudson

C J Fynn wrote:
More than once during this discussion, I've thought that something
approaching a general
principle might be stated as 'related dead scripts should be unified; their
living
descendants may be separately encoded'.

Where two 'related dead scripts' have substantial differences in shaping
requirements this might create major implementation difficulties on some
systems.
See continuation of my exchange with Rick: I was presuming that technical obstacles to 
unification were not at issue, since these should be considered first. If there are 
technical obstacles, obviously there is no point in debating the merits of unification.

With 'Phoenician', we appear to have no technical obstacle to either the unification of a 
number of ancient North Semitic scripts or, indeed, unification with the existing Hebrew 
block. Hence the debate: not, as Ken suggested, how to apportion the halves of the baby, 
but where to make the cut.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-05-02 Thread John Hudson

[EMAIL PROTECTED] wrote:
The Mesha Stele (otherwise known as the Moabite Stone) is already 
available in Hebrew script. What is the need for a separate encoding of 
the same text?

There are probably other transliterations of the text already available,
too, such as Latin.  Wouldn't it be nice to see the inscription displayed
in its original script, properly encoded?
This is a silly question, because the whole debate is about that constitutes 'properly 
encoded'. The Mesha Stele can be perfectly easily encoded using existing Hebrew codepoints 
and displayed in the Phoenician style with appropriate glyphs.

I'm not saying that this is necessarily the best encoding for the Mesha Stele, but I'm 
certainly not convinced that there is anything improper about it, or that having a 
separate encoding for those glyphs would be more proper.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-05-02 Thread John Hudson

[EMAIL PROTECTED] wrote:
While the fact that it's called Phoenician script doesn't prove anything
about its origin, it might be considered indicative of the path through
which the script was borrowed.  
Indeed. This is the point I made earlier: Greco-centric European scholarship of writing 
systems calls the script 'Phoenician' because the Greeks derived their alphabet from trade 
contact with the Phoenicians. As should be obvious from recent debate, semiticists look at 
the old Canaanite writing systems in a different way.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: ISO 15924

2004-05-02 Thread John Hudson

Michael Everson wrote:
The Unicode Consortium has been designated as Registration Authority for 
ISO 15924; I have been engaged by the Consortium to act as Registrar.

The ISO 15924 web site is now online at http://www.unicode.org/iso15924/
In the code lists at http://www.unicode.org/iso15924/iso15924-codes.html the 4-letter 
script codes are shown capitalised, e.g. Arab not arab, Armn not armn, etc.. Is this 
intentional? Should the codes always be capitalised? Does it matter if they are not?

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: For Phoenician

2004-05-02 Thread John Hudson

Asmus Freytag wrote:
At 10:25 AM 5/2/2004, Michael Everson wrote:
Do you really think it necessary that the proposal be a thesis
reprising a hundred years of script analysis?
I think what's desirable is something of a summary that applies this
analysis in a way that it can be related to the research. A thesis would
imply original development.
Some acknowledgement that there is disagreement in this field would also be welcome. I
don't think there is anything wrong with saying 'this encoding unified the following
writing systems based on this analysis', while also acknowledging that this is not the
only possible analysis and that some scholars may disagree or propose altertnative
analyses. Michael can, quite often, give the impression that he is objectively right and
everyone who disagrees is an idiot, while I have tried to suggest, throughout this debate,
how opposing views are based on variant analyses of old Canaanite writing systems by
experts in different fields.

Michael thought, based on his analysis and that of the studies he consulted, that encoding
of Phoenician was 'simple and obvious'. Whatever else he is right about, he seems to have
been wrong in this: many of the people most likely to be working with old Canaanite texts
in various languages -- i.e. semiticists -- seem to consider Phoenician anything but
simple and obvious.

Re: New contribution

2004-05-02 Thread John Hudson

[EMAIL PROTECTED] wrote:
This is a silly question, because the whole debate is about that constitutes
'properly
encoded'. The Mesha Stele can be perfectly easily encoded using existing Hebrew
codepoints
and displayed in the Phoenician style with appropriate glyphs.

I'm not saying that this is necessarily the best encoding for the Mesha Stele,
but I'm
certainly not convinced that there is anything improper about it, or that having
a
separate encoding for those glyphs would be more proper.

There's nothing improper about transliteration. Likewise, the Phoenician
inscription of Edessa in Macedonia could be easily encoded using existing
Hebrew code points, even though its language is Greek.
Again, you are missing the point because you are *assuming* that encoding the Mesha Stele
with Unicode Hebrew characters = transliteration, i.e. that there is some other encoding
that is more proper or even 'true'. The contra-argument is that the 'Phoenician' script is
identical to the Hebrew script, the differences in letterforms being merely glyphic
variants. The contra-argument disagrees with your premise that encoding the Mesha Stele
with Hebrew characters is transliteration. You can't proceed past that argument simply by
restating your premise.

I'm not saying that I agree wholeheartedly with the contra-argument, but don't think you
can duck the argument by begging the question.

Re: New contribution

2004-04-30 Thread John Hudson

Philippe Verdy wrote:
Let's keep Hebrew clean with only modern Hebrew and traditional pointed
Hebrew... The religious traditions in Hebrew are too strong to allow importing
into it some variants and marks coming from separate Phoenitic branches used by
non-Hebrew languages.
The 'religious traditions' in Hebrew are specific to the *letters* (sometimes, 
incorrectly, identified as the 'consonants') of the Torah, i.e. to those letters that, at 
the time of the writing, were identical to those used for Aramaic, Moabite, Phoenician, 
etc. Vocalisation marks are a later convenience, and the existence of multiple pointing 
systems in Jewish use is evidence that, in religious terms, marks are unimportant.

Again, I am not opposing the encoding of 'Phoenician': I just want to see the real issues 
resolved. To my mind, there is essentially only one major issue in encoding the ancient 
North Semitic script separately from Hebrew: how should users encode Palaeo-Hebrew texts? 
With the new codepoints, or with the Hebrew codepoints? The text is Hebrew, but the 
appropriate glyph forms are ancient North Semitic. I do think there is the possibility of 
significant confusion, which is not grounds for refusing to encode the ancient North 
Semitic script, but does suggest that a specific recommendation should be made, either in 
the TUS or by an appropriate and representative scholarly body.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-04-30 Thread John Hudson

Michael Everson wrote:
Okay, perhaps we're getting somewhere and beginning to understand each
other. What you are saying, in effect, is that there is already a de
facto unification of Phoenician and Hebrew encoding, employed by a
significant user group.

But there is no de facto unification. This script has been on the books
for ages. This script has been described by historians of writing as
distinct from Hebrew for two hundred years.
There does appear to be a de facto unification *in the practice of many semiticists*,
particularly when dealing with ancient Hebrew texts written with 'Phoenician' letters. I
think this is perfectly understandable, given that the language is Hebrew regardless of
what the letters look like: why would they have thought to encode such texts as anything
other than Hebrew? I'm not saying that this is a reason not to encode what I favour
calling the 'Ancient North Semitic' script, only that we need to acknowledge that there is
a genuine question about how ancient Hebrew texts in that script should be encoded. Simply
repeating 'They're different scripts' is not addressing that question. It is very obvious
that, among semiticists, there is a prevailing concept of a single 22 letter abjad. I'm
not saying that this concept should determine what gets encoded in Unicode, only that we
should acknowledge and address the confusion that stems from the incompatibility of
contrary views of the semitic script universe.

Re: New contribution

2004-04-30 Thread John Hudson

Michael Everson wrote:
At 19:10 -0700 2004-04-29, John Hudson wrote:
Michael, Peter is not talking about the Phoenician language being 
represented in the Hebrew script, he is talking about the common 
practice of semiticists to *encode* the Phoenician script using Hebrew 
codepoints. The representation of the text is in Phoenician glyphs, 
not Hebrew, but these glyphs are treated as typeface variants of Hebrew.

I have plenty of fonts where the Phoenician glyphs are treated as 
typeface variants of Latin.
But presumably these are not used to write English text or, for that matter, Latin. The 
issue at question is the encoding of *Hebrew* text as written in Phoenician-style letters.

This isn't a show-stopper, but I've asked several times now how you and others think 
semiticists should encode such text: with Hebrew characters corresponding to the language 
of the text, or with 'Phoenician' characters corresponding to the look of the text?

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-04-30 Thread John Hudson

Michael Everson wrote:
You would encode the text in Phoenician script if you wanted to encode
it in the script in which it was originally written. You would encode
the text in Hebrew script if you wanted to encode it in the script in
which it was later written (after the Exile) and if you wanted to encode
it in the script in which it is currently written.
Sigh. Are you just not getting the question? The people who have an issue with this do not
recognise two distinct scripts, and they are not going to recognise two distinct scripts
whether Unicode encodes them as such or not. You can repeat yourself as often as you like,
but this is not about wanting to encode in one script or another, this is about people
who, because of the field of expertise in which they work, are dealing with texts in a
language that is normally encoded using a particular set of *characters* being given
another set of characters *for the same texts*.

On the one hand, the obvious recommendation would be to tell semiticists to continue doing
what they have been doing: encoding as Hebrew and displaying with Phoenician-style glyph
variants, as this enables textual analysis and comparison with a larger body of Hebrew
text in which such experts are likely to be interested. But your proposal specifically
states that the 'Phoenician' characters should be used to encode Palaeo-Hebrew, as if
somehow Hebrew and Hebrew are different languages when they look different.

Re: New contribution

2004-04-30 Thread John Hudson

Ernest Cline wrote:
Is this controversy over Paleo-Hebrew occurring in any context
other than the tetragrammaton?
Yes. Use of this style of lettering for the Tetragrammaton is a very late development and,
despite its importance in the argument that a plain text distinction needs to be made
between Hebrew and Phoenician, it is really irrelevant to what I am talking about.

What I'm referring to is the body of inscriptional and numismatic text from a period of c.
700-800 years in which the Hebrew language is written in the common North Semitic script
that is covered in Michael's proposal. The point is that this is all Hebrew language text,
easily encodable with existing Hebrew characters, and semiticists have a practical
interest in not making a distinction in the corpus of Hebrew text based on the style of
lettering used. This is why -- despite supporting the encoding of the ancient North
Semitic script, and thinking that Michael's proposal is really good -- I think this
question deserves to be addressed with something other than arrogance and defensiveness.
Michael seems to be taking it personally -- as if questioning the proposal implies
questioning his credentials or knowledge --, but all I'm personally questioning is the one
sentence in which he says the new Phoenician characters should be used used for
Palaeo-Hebrew. I'm not sure that this is the best recommendation to make to the people who
actually work with Palaeo-Hebrew.

Re: New contribution

2004-04-30 Thread John Hudson

[EMAIL PROTECTED] wrote:
On the one hand, the obvious recommendation would be to tell semiticists to 
continue doing what they have been doing: encoding as Hebrew and displaying 
with Phoenician-style glyph variants, as this enables textual analysis and 
comparison with a larger body of Hebrew text in which such experts are 
likely to be interested.

To which Michael will likely reply that encoding Phoenician will not prevent
this behavior.
And he will be right.
While I sympathise with the view that the can be legitimately viewed 22CWSAs as a single 
script -- *all* categorisations being arbitrary, after all --, and recognise that there 
are very good reasons for this view in some contexts, I do not oppose on principle the 
distinction of the ancient North Semitic script from 'modern' Hebrew in the context of 
Unicode. Mind you, I also wouldn't be at all fussed if your view were to prevail and the 
proposal were dropped, however unlikely this now seems.

My concern is only with what the standard ends up saying about this script, and whether it 
generates confusion among scholars who are for legitimate reasons not used to considering 
-- let alone working with -- Hebrew and the common North Semitic script as distinct.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy

Re: New contribution

2004-04-30 Thread John Hudson

Michael Everson wrote:
Scholars of writing systems have always recognized the distinction. No
one teaches that The Greek script is derived from the Unified
Twenty-Two Character West Semitic Abjad. They teach that The Greek
script is derived from the Phoenician script. They certainly do not
teach that The Greek script is derived from the Hebrew script.
Michael, I've read the same books that you have. Scholars of writing systems -- unless
they are specialists in narrow fields, like Solomon Birnbaum -- tend to be generalists,
and their intent is to classify, categorise and arrange writing systems in neat
chronological tables. That's fine; that's their job.

But the majority of users of the Unicode standard are not scholars of writing systems, and
the classification, categorisation and arrangement of scripts -- and I'll remind you that
on the Qalam list Peter Daniels, a very noted scholar of writing systems, questioned the
whole concept of 'scripts' -- is not what they are worrying about when they sit down to do
*their* jobs. Semiticists have some particular concerns about your proposal that stem from
how they do they job. Telling people whose job involves viewing the relationship of
Near-Eastern writing in a particular way that a bunch of people who do a different job
view it differently is not helpful.

Again, again, again: I am not opposing the encoding of the ancient North Semitic script
under whatever name separate from 'modern' Hebrew, even though I don't think the
distinction between the two is so clean as you claim. It is clean enough for most users. I
just want to see if encoded and documented in such a way that it does not generate any
more confusion than necessary for those users for whom the distinction is not only untidy
but, in their work, traditionally non-existent.

Re: New contribution

2004-04-29 Thread John Hudson

Mark E. Shoulson wrote:
This sounds a lot like what is being proposed, modulo a name-change:
we're working on a Samaritan proposal, Hebrew's already there, and
Michael has proposed Old Canaanite, which for some reason he has chosen
to call Phoenician. The name may be ill-chosen, and it isn't too late
to change it, but it sounds like you're in general agreement with me and
Peter Kirk.
Mark, are you sure that you and Peter are in general agreement? Peter seems to be opposing
the encoding of Old Canaanite / Phoenician / Ancient North Semitic outright, while you and
Dean seem to be supporting some kind of unified encoding for some subset of ancient
Near-Eastern scripts separate from the existing Hebrew block. [On the question of Aramaic,
the agreement seems closer.]

Today I received my long sought copy of Birnbaum's _The Hebrew Scripts_ (Brill, 1971), and
immediately noted the following comments (vol.1 p.34)

These documents*, which do not themselves come within
the scope of Hebrew palaeography, are here given because
they have been utilised for the dating of Palaeo-Hebrew
material. In this procedure we are following the general
practice of the Semitic palaeographers who have treated
the scripts of Phoenicia, Palestine, Moab and Aram as a
unity, i.e. North Semitic. For the early centuries there
can be no objection to that. For these inscriptions show
that any regional differentiation would as yet have been
so slight as to be practically negligible. Hence they can
serve to tell us how the Palaeo-Hebrew writing looked at
a period from which we have no archaeologically datable
Palaeo-Hebrew documents. That we are on safe ground here
is corroborated by recent discoveries
To apply the term Phoenician to the script of the
Hebrews is hardly suitable. I have therefore coined the
term Palaeo-Hebrew.
*23 plates of inscriptions identified by Birnbaum as Phoenician, Aramaic and Moabite.
This text, along with this illustration from the second volume (figures 019 to 1)
http://www.tiro.com/view/NorthSemitic.jpg
is instructive. It at once suggests the unification of ancient North Semitic scripts on
the basis of 'practically negligible' differentiation, while at the same time inisting on
the distinct identity of Palaeo-Hebrew even when, as the illustration shows, it is
virtually identical to contemporary forms of the other scripts. Clearly, for Birnbaum,
Hebrew palaeography is firstly the study of Jewish writing in the Hebrew language -- 'the
script of the Hebrews' -- a priority that informs the distinction between Hebrew and
'Phoenician by any other name' similar to that manifested in Michael's proposal. At the
same time, looking at the illustration, it is easy to see the point of the objections to
the proposal. Birnbaum -- a palaeographer who can speak with confidence on the minute
details that distinguish Palestinian from Egyptian forms of the maaravic style -- might
make the distinction of his Palaeo-Hebrew from contemporary Phoenician, Moabite and
Aramaic writing on visual grounds even if he did not intend to on ethnic grounds, but I
doubt many other people could.

I offer this not in support of arguments for or against Michael's proposal, but to try to
illustrate the basis of the disagreement.

1 2 3 4 5 6 >

1 - 100 of 515 matches

Mail list logo