Re: Vertical BIDI

2004-05-19 Thread Timothy Partridge
Philippe Verdy recently said:

 From: [EMAIL PROTECTED]

  What's uncertain is whether a lr or a rl progression is favored, given the
  paucity of evidence.  Michael favors lr progression.  There is no question
  that the text is read BTT.

 This creates an interesting problem: Put in the same sentence Han (Chinese) and
 Mongolian words in a vertical layout (I don't think this is unlikely, as
 Mongolian is also spoken in China, and there's also a Chinese community in
 Mongolia). So Chinese ideographs will be laid out vertically from top to bottom
 (but not rotated, except for a few characters like ideographic punctuation marks
 or symbols), and Mongolian will be laid out from bottom to top in their normal
 stack orientation. Such a text is clearly bidirectional, so we would need BiDi
 processing to order glyphs correctly.

John's comment refers to Ogham. Mongolian goes top to bottom.

 Now try including some Latin words in this text (also not unlikely: there are
 lots of trademarks and people names that will need to be written with their
 normal Latin characters). If the text is presented vertically, there's a
 legitimate question of whever Latin should be rotated (but it will keep the Han
 flow direction.)

Latin and Cyrillic are rotated 90 degrees clockwise when mixed with
Mongolian in vertical lines. Presumably Arabic would be rotated 90 degrees
anti-clockwise. (The ancestor of Mongolian was which is why the vertical
lines go left to right.) One amusing aspect is that punctuation like ? and !
stay vertical at the end of Mongolian sentances, but are rotated at the end
of Latin and Cyrillic ones.

Mongolian is somewhat unusual in that nowadays when it is written in
horizontal lines, it is rotated a further 90 degrees so it goes left to
right and is upside down compared to the ancestral script.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer




Re: vertical direction control

2004-03-24 Thread Timothy Partridge
Peter Kirk recently said:

 It seems strangely inconsistent to me that Unicode has detailed controls 
 for horizontal layout direction and the complex bidi algorithm, but has 
 nothing for vertical layout. I can force Latin text to be rendered right 
 to left or Hebrew left to right (although such overrides are hardly 
 plain text issues), but there is no way I can select vertical layout 
 even for languages in which that is a normal way of writing. We already 
 have U+202A LEFT-TO-RIGHT EMBEDDING and U+202B RIGHT-TO-LEFT EMBEDDING. 
 It would be easy to define new characters TOP-TO-BOTTOM EMBEDDING and 
 BOTTOM-TO-TOP EMBEDDING, with similar scope until the next PDF 
 character. The difficult part would be implementing this, and before 
 that defining the exact semantics (but Unicode could define the 
 semantics as beyond its scope). (Another problem would be deciding which 
 variant of mirrored characters e.g. brackets to use given that the 
 context is neither RTL nor LTR - this is a problem with Egyptian 
 hieroglyphs, many of which are mirrored in horizontal text.)

For Egyptian hieroglyphs the characters generally face towards the start of
the reading direction. (The occasional one is reversed, and sometimes whole
texts face the wrong way.) So for horizontal l-to-r t-to-b face left, r-to-l
t-to-b face right. For vertical t-to-b l-to-r face left, t-to-b r-to-l face
right. In this case the fact the the inscription is top to bottom doesn't
help - you need to know what the column arrangement is. You can even have
both arrangements in one inscription, e.g. on either side of a doorway the
figures face towards the door. (The bit over the door had the same
arrangement as one of the sides rather than meeting halfway in the example
I've seen.) IIRC it's like

RLL
R L
R L

Captions next to people in a larger picture usually face in the same
direction as the person.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer




Unwanted publicity?

2004-01-28 Thread Timothy Partridge
I was somewhat surprised to see the word Unicode on page 8 of the Metro
newspaper (London, UK) today (January 28, 2004).

Unfortunately it was in the middle of an article about Mydoom, where it says
The message may read 'The message contains Unicode characters and has been
sent as a binary attachment.' This was the only one of the possible
messages they quoted, presumably because it was the most distinctive.

The name Unicode is now in mailboxes around the world - is this a good or
bad thing?

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer




Re: What is a process?

2003-11-26 Thread Timothy Partridge
Peter Kirk wrote:

 As there hasn't been a rush of on-list responses to this one, and partly 
 in reply to the one off-list response, let me clarify the issue I am 
 have in mind.

 Instance A of a program P, version X, writes a Unicode character string 
 S, in a particular normalisation form, to a storage medium Z. Some time 
 later (maybe seconds, maybe years) instance B of version Y of that same 
 program P reads that string from the same storage medium. For the 
 purposes of Unicode conformance, are instances A and B to be considered 
 one process or separate processes?

I would say a process is something that carries out some sort of task on
data. Typically data both comes in and goes out. It might be to the outside
world or to a data store. 

 Conformance clause C9 states that no process can assume that another 
 process will make a distinction between two different, but 
 canonical-equivalent character sequences, which implies that no process 
 can assume that another process has correctly normalised any character 
 sequence. So, if instances A and B are considered separate processes, B 
 is not permitted to assume that the string S has been correctly 
 normalised - even if in fact it is known that all strings on medium Z 
 have been written by program P and that all versions of program P write 
 strings in a particular normalisation form.

I would consider A and B to be different versions of the same process. I
read the word assume to mean make an assumption without definite knowledge.
If process B *knows* something is true it can exploit that knowledge. If on
the other hand it is receiving data from a process outside its control
(owned by a third party perhaps) then it can't guess that the data have any
particular charateristics. It is common for a process to be composed of
sub-processes. If they can't exploit their knowledge of one another then you
have serious problems. To take an extreme case how could you call a
normalisation process if you couldn't rely on it returning normalised data? 

 Also, can the storage medium Z be considered a process?

No it is a data store.

 Or can low-level 
 transformations of the data, e.g. defragmentation, backup and 
 compression, which are invisible to the program P be considered 
 processes? If so, these processes are permitted to transform S into a 
 canonically equivalent form; and so instance B of program P is not 
 permitted to assume that the string it reads from Z is in the same 
 normalisation form as the string written by instance A.

At some point your system will make use of a data store. It is entitled to
assume that what it gets out of the store is what was stored into it. The
operating system might make invisible compressions or duplications, but the
system using the data store is oblivious to them. If the operating system
doesn't return what was put in then it doesn't qualify for an *invisible*
change. I would expect the operating system documentation to make very clear
if the storage routines don't return what you gave them in the first place.

Tim 

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer




Re: Punctuation symbols for partial cuneiform characters

2003-09-04 Thread Timothy Partridge
John Cowan recently said:

 No, indeed.  Even the hopeless innumerate should be able to grasp
 the ceiling and floor functions, however:  the floor of four and a half
 is four, whereas its ceiling is five.  Some speak of rounding down and
 rounding up respectively.

The hopelessly innumerate might get confused with minus four and a half. The
floor is minus five and the ceiling is minus four. (The floor goes towards
minus infinity not zero.)

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer




Re: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-19 Thread Timothy Partridge
John Cowan recently said:

 Marco Cimarosti scripsit:

  You could generalize it a bit: Alignment Of Metric And Imperial Units Whose
  Difference Is So Small As To Be Pointless.
  
  E.g., I never understood why on earth metres and yards should be kept
  different. In a public park somewhere in UK or Ireland I have seen the
  following sign:

 Because the yard isn't just an isolated unit, like the pound in various
 European countries.  It's part of a coherent (if profoundly messy) system.
 If we reduce the yard by 9%, the inch has to shrink too, and the last
 thing we want is to try to fit a 1/4 inch bolt (6.35 mm) into a nut
 whose inside diameter is only 5.81 mm.  It's bad enough to have to have
 two kinds of hardware already: having incompatible things both labeled
 1/4 inch would be the facilis descensus Averno indeed.

In the UK the inch is now defined as 25.4mm rather than a subdivision of a
standard yard kept under lock and key. If you peruse electronics catalogues
you will discover that many components have leads spaced at a pitch of
2.54mm which seems a remarkable degree of accuracy. When I was younger they
were a nice round 0.1.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer




Re: Encoding: Unicode Quarterly Newsletter

2003-03-11 Thread Timothy Partridge
Ken recently said:

 Not to disagree publicly with Michael or Mark on this, but
 in the interests of accuracy, I should point out that if the
 rest mass of the Unicode 4.0 publication is assumed to be exactly
 4.1 kg (which then would, indeed, also be the case on our
 moon, or even a Jovian moon), and ignoring any relativistic
 corrections for relative motion -- since it is unlikely that
 anyone will be reading the standard while it is moving at
 a significant fraction of the speed of light -- then we can
 calculate the weight as being *approximately* 9.05 pounds
 (avoirdupois) [or 10.99 troy pounds].

I think relative motion cannot be ignored. The subjective weight will be
much higher if the book is dropped on the reader's foot. Perhaps it should
have very soft covers.

Will the book have on the back cover a list of the languages that can be
written with Unicode, and if so, what type size will be used?

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer




Re: Small Latin Letter m with Macron

2003-01-21 Thread Timothy Partridge
John Hudson recently said:

 At 12:29 PM 1/16/2003, Timothy Partridge wrote:

 Charles Trice Martin wrote The Record Interpreter which lists words in
 record type and their expansion. The 2nd Edition (1910) has been reprinted
 many times. The 1999 reprint is a facsimile of the 1910 edition, rather than
 being re-typeset.

 The other standard text, which has the added benefit of being more 
 international than _The Record Interpreter_, is Cappelli's _Lexicon 
 abbreviaturarum_ . [snip]

The abbreviated text in Cappelli is mostly handwritten (though in the
introductory bits he does use 9 for a con sign and 2 for a round r!). I
mentioned Martin because the abbreviation symbols are typeset.

One challenge for representing abbreviations in plain text (as opposed to
fancy) is the use of superscripts to represent some letters including this
one have been omitted here. Meaning is lost without the superscripts.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: Small Latin Letter m with Macron

2003-01-21 Thread Timothy Partridge
John Jenkins said:

 On Thursday, January 16, 2003, at 01:29 PM, Timothy Partridge wrote:

  Yes, especially early printing of Latin documents. See for example
  Gutenberg's bibles.
 

 Well, for that matter, even current editions of Spenser's _Faerie 
 Queene_ will use the occasional õ for on, and so on.

At least as late as the 1970s the English Statutes in Force had Magna Carta
in abbreviated Latin with English translation. It dates from 1297. Quite a
lot of it has survived. Much of the sections about the liberties of the
forest have been repealed because you can't go around killing wildlife in
forests these days.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: Small Latin Letter m with Macron

2003-01-16 Thread Timothy Partridge
Cristoph Päper recently said:

 Kenneth Whistler:
  Christoph Päper asked:
 
  writing mm  as only one m with a macron above.
 
  Handwritten forms and arbitrary manuscript abbreviations
  should not be encoded as characters.

 Although I've got no proof for it, I was told that it has also been used in
 print.

Yes, especially early printing of Latin documents. See for example
Gutenberg's bibles.

In the nineteenth century, in England, many old handwritten records were
were printed in record type. This is like ordinary type but contains extra
characters for the abbreviation marks. (It is in a typical serif font, not a
handwriting style font.) I think the reason for reproducing in the condensed
form rather than expanding the abbreviations, was that some abbreviations
have more than one interpretation. For legal records an incorrect expansion
can have a significant effect. The literal transcription reduces this risk.
(It still requires someone to read the old handwriting correctly.)

Charles Trice Martin wrote The Record Interpreter which lists words in
record type and their expansion. The 2nd Edition (1910) has been reprinted
many times. The 1999 reprint is a facsimile of the 1910 edition, rather than
being re-typeset.

 Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: Mongolian Encoding

2002-12-18 Thread Timothy Partridge
You recently said:

 On Mon, 16 Dec 2002 09:30:10 -0800 (PST), [EMAIL PROTECTED] wrote:

  I think that it is intended to use the eqivalent Tibetian character sequences
 to
  produce the various types of Biruga, rather than MFVSs.

 Sound eminently sensible and Unicode-like to use Tibetan symbols for Mongolian
 where appropriate. Is the following what you're suggesting ?

 1st variant form = U+0F04
 3rd variant form = U+0F04, U+0F05
 4th variant form = U+0F04, U+0F05, U+0F05

Yes. It's just my suggestion though. We'll have to see what everyone one else thinks.

  This does raise an issue
  over the rotated varient but that perhaps could become the standard glyph for
  the character in the Mongolian block.

 Is it possible to change the standard glyph for a character once it has been
 carved in stone on the Unicode code charts ? And if it were possible, then how
 would the horizontal form be represented ? There is no exactly corresponding
 form in the Tibetan block.

Oops, I was reading my mail remotely and didn't have any books available and
my memory failed me. You are quite right. I think we do need a variation
selector for that varient.

On the issue of glyphs, I think I am right in saying that Unicode doesn't
standardise these. The ones in the code charts are just examples to aid in
identifying the character. Font designers can do whatever they fancy, but if
all their letter As came out looking like Bs they wouldn't be popular. The
glyphs on the Mongolian code chart are especially unusual since some obscure
varients have been picked to provide unique glyphs for the characters across
the various sub-scripts. I would have thought that a keyboard for typing
Sibe, say, would just have isolated / initial forms on the keys. 

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: In defense of Plane 14 language tags (long)

2002-11-05 Thread Timothy Partridge
Doug Ewell recently said:

 1.  Language tags may be useful for display issues.

Another use for language tagging is the correct formation of ligatures. E.g.
fi ligature is fine in English, but causes problems in Turkish because of
confusion with undotted i. 

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: Variant selectors in Mongolian

2002-07-11 Thread Timothy Partridge

Ken Whistler recently said:

  The value of the 
  variant selector to the user is in knowing what the result is going to be, 
  and this means that the variant form *must* be specified. 

 It is. See above.

  How else can the 
  variant selector be used to *select* a particular form? Selection implies a 
  deliberate choice, not a willingness to accept any substitution a font 
  might provide.

 I agree. Although variation selectors also imply willingness to accept
 fallback to default glyphs as legible alternatives, if not the
 desired alternatives.

I'd like to suggest a particular example to clarify what you expect to happen.

If the computer is asked to render toeroen (which is the penultimate word on
page 547 of The World's Writing Systems, Daniels and Bright.), what do you
expect the display to look like?

I think the characters are U+1832 U+1825 U+1837 U+1825 U+1828 (I'm not sure
about the n). My particular interest is the first U+1825. When there is no
preceding vowel in the word, this character takes on a different medial form
to distinguish it from the male U+1823. This is a normal behaviour of
Mongolian. The different form is listed as being available with the use of a
varient selector in the Unicode table.

Would you expect the rendering software to spot there was no preceding vowel
in the word and automatically select the correct medial glyph? Or would you
expect the software to display the default medial glyph for U+1825 which
looks like that for U+1823 and the user would have to include a varient
selector 1 to achieve the desired result?

Or to put it another way are the varient selectors rarely used (for unusual
situations) or more frequently used for any situation where the default
glyph in that position is not the desired one?

I think this depends on whether the rendering software simply treats
Mongolian as like Arabic with alternate glyphs available for selection, or
has a deeper knowledge of the appearance of Mongolian. I believe Unicode
should take an explicit position on this as it has important implications
for successful rendering of plain text on various platforms. (If the deeper
knowledge position is taken, which I think is of significant benefit to the
user, then the exact rules that are to be supported need to be stated.)

The UNU/IIST report 170 takes a third position on the issue and in section 5
seems to misunderstand Unicode's distinction between characters and glyphs
and suggests the input method selects appropriate characters including some
from the PUA for presentation forms and ligatures. This appears to me to
be akin to a web font trick.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: Saying characters out loud (derives from hash, pound, octothorpe?)

2002-07-08 Thread Timothy Partridge

William Overington recently said:

 Still no olde worlde shoppe name with a yogh in though yet?  :-)

Why bother with an old one when there is a current shop with a yogh? Do you
have a newsagent called Menzies in your part of England? (They have spread
from Scotland.) That isn't a zed (or zee) in the name; it's a yogh.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





RE: Phaistos in ConScript

2002-07-08 Thread Timothy Partridge

Marco recently said:

  5. I find that mirroring the signs as you did in your font is an
  unhistorical. The whole corpus is right-to-left, and the 
  fact that the signs
  where impressed with types makes it impossible that the 
  signs could have
  been reversed. In academic books, it is common practice to 
  type the disc's
  text left-to-right, but the signs are not reversed.
  [Michael]
  I have followed Egyptological -- and ancient Egyptian -- practice 
  here. If the script is represented right-to-left the faces point to 
  the right so that you read into their faces. If the script direction 
  is reversed so that it is left-to-right, it is conventional -- among 
  Egyptologists and ancient Egyptians -- to reverse the signs as well. 

 I see. But Hieroglyphs were handwritten, not typed. Moreover, the
 mirroring of glyphs is actually attested for Egyptian.

  Godart does not reverse the glyphs even though he reverses the 
  directionality, but I think it is *his* practice which is 
  ahistorical, and I think it makes the text harder to read. And I 
  suspect is has to do with the font technology he had in 1994 when he 
  wrote his book.

 It's seems that July 2002 is our disagreement month... I think that Godart
 was perfectly right avoiding assumptions that he could not support: there is
 no reason to think that the Phaistos script should work as Egyptian
 hieroglyphs work.

I would support you in this. Michael says that all the scripts in the region
go both ways, but we don't even know that the disk is from the region. (And the
headdresses apparently don't look local.) It might have come some way in trade.

I feel tempted to protest that the characters aren't in the right order, but
someone might take me up on that :-) I'm probably right though!

[The reason I haven't replied directly to Michael's message is that
something about his messages crashes my mail reader when I try it. Apologies
to everyone for accidently including a load of message headers last time I
tried a workaround.]

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





RE: Inappropriate Proposals FAQ

2002-07-03 Thread Timothy Partridge

Marco Cimarosti recently said:

 - No presentation glyphs for shapes that can already be obtained using
 regular characters in conjunction with ZWJ or ZWNJ.

Why not just presentation glyphs in general? We seem to have queries about
Indian cojuncts fairly frequently.

Some more suggestions (some of which have covered from other angles already)

- No scripts with a limited body of text in existance. (No need to exchange
or analyse on computer.) E.g. Phaistos disk script

- No scripts which are poorly understood and it is not clear as to what the
characters are. E.g. Rongo-rongo.

- No symbols that are just a picture of something with no other meaning e.g.
a dog. (These tend not to have a fixed conventional form.)

- No symbols that are only used in diagrams rather than running text. e.g.
electrical component symbols.

- No personal, ideosyncratic or company logos. E.g. the artist when he was
not known as Prince.

- No archaic styles of existing characters. E.g. dotless j.

- No control codes for fancy text. E.g. begin bold

- No characters that can be obtained by using a different font with existing
characters and have no semantic difference from the existing characters.

- No proposals to rename existing characters. (But a clarifying note might be added.)

- No proposals to reposition existing characters, e.g. so they sort better.

- No proposals for a newly invented character since putting it in the
standard would help promote its use. (Significant usage must come first.)

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: Chromatic font research

2002-06-27 Thread Timothy Partridge

Sampo Syreeni recently said:

 National flags are a far cry, true. Naval signalling ones perhaps aren't.
 They stand for characters and I believe in some variations for entire
 well-known concepts. They are utilized in a way we would expect characters
 to be. I don't think the entire collection of flags used around the world
 coincides neatly enough with an already encoded script to be considered
 pure glyph variants. And colors are certainly meaningful in this context.

 (I can't fathom why anyone would want to encode those, though. Anything
 you can do with flags you can do with ordinary characters, only more
 efficiently. However, this could serve as an example of a script which
 relies on color as an essential feature.)

I'd agree that you wouldn't want to encode them, but you might want to make
a font where each signaling flag is in the place of its corresponding
character. That would be a use for chromatic fonts. The only other use that
springs to mind is Egyptian hieroglyphics which have a colouring scheme when
written in full colour. (Of course colour isn't *required* when reading
them, it is just an aid that helps recognition.)

As someone (Doug?) pointed out a little while back on another thread, fonts
are (mis)used to hold collections of graphics conveniently. I imagine that
if chromatic fonts were available this kind of usage would grow. It would
also allow things like illuminated capitals to be put in a font rather than
suplied as a collection of graphics files.

   Tim 

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: 3 big bidi bugs

2002-05-29 Thread Timothy Partridge

Bernard Miller recently said:

 This can be fixed by rewording step L2 such that a reversal happens from the
 highest embedding level to each lower contiguous embedding level, regardless
 if the embedding level is represented by a character on the line, until the
 embedding level of 1 is reached (or, as an optimization, until the first odd
 embedding level equal to or lower than the lowest embedding level
 represented by a character on the line).

I had always interpreted L2 in the manner of your suggested correction, but
perhaps the language could be clarified.

 (2)  Line width dependent mangling, spelling conventions for quotes:
 What is the purpose of step X10 if not to allow something like LEFT DOUBLE
 QUOTATION MARK to be used as if it was an OPEN DOUBLE QUOTATION MARK? One
 simply puts an embedding inside a quotation, such as “RLEquotationPDF”.

Surely if the quotation is meant to be right to left the RLE and PDF should
be outside the entire thing, including the quotes. After all the intention
is for the quotes to match the text is it not?  

 (3)  Mirroring ambiguities: 
 What if eor = sor? 

 text: R RLO whatever PDF N LRO whatever PDF 
 embedding level at step X9:   1 3  3 1 2  2 
 directional type at step X10: R R  R ? L  L 

Have you perhaps misunderstood sor and eor? They are imaginary things
inserted at the run boundaries, not a role undertaken by an actual character
inside the run.

For the above I make them as follows:

 text:R RLOwhatever PDFN   LRO  whatever PDF 
 embedding level at step X9:  13  312  2 
   s   es  es   es  e
 directional type at step X10: R  R  R  R  R  R  R  R  ?  L  L  L  L  L

In particular at the start of the level 1 run in the middle the highest
level on either side of the boundary is 3 so the direction of the sor (and
the preceding eor) is R. At the end of the run the highest level is 2 so
the eor is L as is that of the following sor.

The Neutral has a conflict of directions surrounding it so it takes the
embedding direction which is R.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





RE: [OT] Re: The exact birthday of French: 0842-02-14

2002-03-28 Thread Timothy Partridge

Elliotte Rusty Harold recently said:


 What's really needed to conclusively disprove this hypothesis is a 
 verifiable event well in the middle of the problematic years that can 
 be dated both backwards and forwards in time; i.e. that can be 
 established as N years before the present and X years after the reign 
 of one of the Caesars (or something similarly well-established.) Here 
 event should be understood quite broadly to include not only 
 battles, deaths of kings etc. but also buildings, coins, natural 
 phenomena like comets and eclipses, etc.

 The test of a good hypothesis is its falsifiability, and that's true 
 whether it's right or wrong or somewhere in-between.  What 
 distinguishes science from pseudo-science (and perhaps history from 
 pseudo-history) is that pseudo-science is generally not falsifiable. 
 I think this hypothesis is clearly falsifiable. Is there an 
 astronomer in the house?

A potential problem with lunar eclipses is that the cycle repeats every 18
and a bit years, and this has been known for a long time. So a really
ingenious faker could have cut out an appropriate number of years. Seems a
bit of a leap though to realise that eclipses could be used to verify dates.

As for the number of days out of sync since Julius Caesar's time, I don't
have the full details but the calendar had problems after Julius changed it.
His Greek astonomer said leap years every four years. So they did.
Unfortunately the Romans counted inclusively but the Greeks exclusively
(like we do). So every four years to the Romans is what we would call every
three years. It took them a while to realise. Augustus had a go at the
calendar too. Pinched a day from February leaving it with just 28/29 (Julius
gave it 29/30) and gave it to the month renamed after him (so it would be
the same length as July). Would that cause a one day shift of the spring
equinox too?

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: Synthetic scripts

2002-03-18 Thread Timothy Partridge

Doug Ewell recently said:

 The closest I can come is something like a script that was invented,
 generally by one person and in a relatively short period of time, rather
 than evolving from existing scripts in a gradual and progressive
 manner.

 But right away that definition includes not only Shavian, Tengwar,
 Cirth, Klingon, and most of the contents of ConScript, but also
 Ethiopic, Cherokee, Canadian Syllabics, Gothic, Deseret, and maybe Yi
 Syllabics, all of which are already encoded in Unicode.
[snip]
 I still believe that separating writing systems into a natural or
 real category and an artificial or fictional or synthetic
 category is much less straightforward than those labels imply.

If I went to a community whose language doesn't have a written form and
convinced them that Tengwar would be an ideal way of recording their
culture, would that make Tengwar more legitimate? Or cause people to regard
it as a higher priority? 

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: ISO 3166 (country codes) Maintenance Agency Web pages move

2002-02-28 Thread Timothy Partridge

John Cowan recently said:

 Just how old are house numbers, anyway?  Not the *concept* of
 numbering houses (which seems to be 18th century), but actual unaltered
 house numbers?  Anyone know?

I would imagine they are relatively stable Property boundaries can be very
long lived, since unless you own both properties either side of one, you
can't change it (Govermental interference can change things without the
owners' consent of course)

A change of street name might be an opportunity for renumbering

I wonder how long 10 Downing Street, London has been around?

   Tim

-- 
Tim Partridge Any opinions expressed are mine only and not those of my employer





RE: UTF-17

2001-06-24 Thread Timothy Partridge

 Did anyone already proposed an *UTF-17S*, where astral
 characters are encoded with a 16-byte sequence?

Actually this would be ideal for my astrological database
programmed in FORTH. UTF-16 sorting compatibility is
essential for my application. Due to a five character file name
limit I'll have to call UTF17S UTF17, but I'm sure this won't
confuse any of my users.

   Tim

Historical footnote: The FORTH language would have been
called FOURTH (it's creator felt it was fourth generation),
but the OS it was written on had a limit of 5 letters for files.







Re: On the possibility of guidance code points for the Private Use Area

2001-04-23 Thread Timothy Partridge

Peter recently said:

 William is certainly touching on an important issue: how does your software
 know how to interpret my PUA codepoints. I commend him for thinking about
 the issue, and his thinking outside the box. I don't think I or SIL would
 buy into his suggestion, however. The biggest flaw, which thoroughly
 undermines the ability of this system to work, is that your software has no
 way to actually know whether I'm following these conventions or not.
 Effectively, you're still dependent upon individual agreement between users
 as to the meaning of PUA codepoints.

A good point. A possible workaround would be a new plane-14 tag character.
But as Ken points out the world isn't complex enough yet to need a
standardised way of describing how you're being non-standard.

   Tim


-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





RE: Final letters in Hebrew and Arabic

2001-03-12 Thread Timothy Partridge

James Agenbroad recently said:

 On Sat, 10 Mar 2001, Jonathan Rosenne wrote:

  Regarding Hebrew:
  
   -Original Message-
   From: Nick NICHOLAS [mailto:[EMAIL PROTECTED]]
   Sent: Friday, March 09, 2001 10:12 PM
   To: Unicode List
   Cc: Nick NICHOLAS
   Subject: Final letters in Hebrew and Arabic
 
   (1) When a letter with a final variant appears alone --- say as a numeral,
   or in discussion of the letter or phoneme --- does it under any
   circumstances appear in its final form, or is it always medial?
  
  Monday, March 12, 2001
 When Hebrew letters are used as numbers, (probably not a current
 mainstream practice) the final forms of kaph, mem, num, pe and ssadhe are
 used to repreent 500, 600, 700, 800 and 900. My source: "Alphabete und
 Schriftzeichen des Morgen- und des Abendlandes. 2. Aufl. Berlin:
 Bundesdruckeri, 1969.  Hence my use of German transliterated letter names.
 Use of medial forms would thus change the numeric value; this would also
 mean the final forms could appear in the middle of of a number.  Nakanishi
 (p. 32), Daniels and Bright, (p.490) and Van Ostermann (1952, p.120) only
 give numeric values for Hebrew letters through 400. I do not know if it is
 safe to infer from their silence that use of final forms for 500 to 900
 is a seldom used twig of a seldom used branch. 

Gesenius' Hebrew Grammer Section 5k doesn't mention these. Instead it says a
preceding taw is used to add an extra 400. It also says that thousands are
sometimes denoted by two dots above the letter, e.g. aleph with two dots is
one thousand.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer




Re: Klingon silliness

2001-02-27 Thread Timothy Partridge

Tex Texin recently said:

 Perhaps the real question is what is the criteria for including or
 excluding a fictional script. I have deleted John's mail, but
 his criteria applied more broadly than Klingon if I recall.

 Should we worry about elvish communication and not Klingon?
 Do we apply a business case to fictional scripts and not
 to other scripts?

Some of these scripts are in the PUA Conscript registry. Perhaps if a
significant body of text using a PUA encoding built up and was used for
interchange between many interested parties then it could be considered for
promotion into the standard.

On the subject of it taking hundreds of years to fill up the reserved space,
and the lack of available characters perhaps the most likely event for
filling it is contacting aliens! How come the Klingons only have one
language and script? :-) (Or did one of the movies have a diffent collection
of glyphs?)

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer




Re:FW: Greek questions

2000-11-04 Thread Timothy Partridge

This isn't an official answer, but here goes.


 First, rendering of U+03C3 "Greek small letter sigma". Is it
 allowed and/or encouraged for an application to render this
 code as a final sigma glyph when it occurs word-finally, or
 would this behavior be incorrect?

U+03C3 doesn't take contextual forms. Generally speaking the
standard explicitly mentions contextual forms if shape changes
are to take place. Hebrew is in a similar situation.

 Why is 03C2 not given a compatibility decomposition of
 "final 03C3"?

No idea.

 Assuming that it is legitimate (and indeed should be
 recommended) for 03C3 to be rendered contextually, should
 there be a separate code for "Greek sigma symbol" that
 would be used by mathematicians, etc., when the "normal"
 behavior of the letter sigma is not wanted?
As you point out contextual shaping would cause problems.
There isn't another code AFAIK.

 Section 2.6 "Combining characters" states that "Some specific
 combining characters override the default stacking
 behavior...", [snip] Is there a definitive list of the "specific"
 combining characters that should exhibit such exceptional
 behavior? Or are implementors left to discover the exceptions
 for themselves?

I'm not aware of a definitive list, and I agree one would be
useful. I think Vietnamese and Hebrew are the only other ones.
(That I can think of offhand.) Thai combiners keep a fixed
distance from the base line, so although they stack they don't
(need to) move.

Tim





Re: Colours

2000-10-22 Thread Timothy Partridge

William Overington" [EMAIL PROTECTED] said:

I am reminded of some pictures I once saw on collectable
 postcards.  The pictures were reproductions from a medieval
 book, possibly, but I am not sure, The Tres Riches Heures of
 the Duc du Berry, which is a famous manuscript book.

 Some of the numbers were black and some were red.

Red letter days are certain Holy days and Saints' days in the
Christian calendar. Apparently the list was standardised by the
Council of Nicaea in A.D. 325.

25th Jan Conversion of St Paul
2nd Feb Purification
24th Feb St Matthias
25th Mar Annunciation
Ash Wednesday
25th Apr St Mark
1st May St Phillip and St James
Ascension Day
11th June St Barnabas
24th June St John the Baptist
29th June St Peter
25th July St James
18th Oct St Luke
28th Oct St Simon and St Jude
1st Nov All Saints
30th Nov St Andrew
21st Dec St Thomas

Dateless days in the above depend on the date of Easter.

   Tim





RE: the Ethnologue

2000-09-14 Thread Timothy Partridge

Peter Constable said:

 On 09/13/2000 12:04:24 PM "Ayers, Mike" wrote:

 What I'd really like to know is why there seems to be this  
insistence on only one official list of languages when there appears to be a
 clear need for two.  There appears to be interest for a comprehensive, if 
imperfect, list on one hand, whereas other applications (web use, etc.) are
 interested in a fully researched list like RFC1766 provides.  Why must 
these
 be the same list?  Can't we acknowledge that it's going to take a long time
 to get everything right and work from two eventually converging lists? Just
 wonderin'...

 I have no problem with that whatsoever. Creating an alternate
 namespace mechanism with Ethnologue codes in a separate
 namespace seems to offer exactly what you describe.

I'm wary of having two competing namespaces. As an alternative,
I'd like to suggest something on the lines of en-cockney.
Why not have iso-e-ethnologue as tags? This would be especially
useful where there was just a miscellaneous ISO code.

Applications could choose to parse just the ISO bit, or go for
the full details. When extra languages are added to ISO, the
tags would become out of date, but it would be relatively
easier to identify which of the old tags needed updating.

One potential snag is choosing which ISO tag would prefix a
given Ethnologue tag. Perhaps SIL could give definitive
opinions to avoid user divergence.

 Tim





Re: Splitting lists

2000-07-17 Thread Timothy Partridge

Sarasvati recently said:

 Munzir Taha wrote:

  I vote to your suggestion of opening a separate list.

 Recently there have been a few suggestions for dividing
 the list into separate lists.  Unfortunately, Sarasvati
 runs a Benevolent Dictatorship, not an Athenian Democracy,
 and she believes in Bacchanalian co-educational experiences
 for all.

I don't think we're ready for a touch of satyr.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer




Re: Looking For Information

2000-06-28 Thread Timothy Partridge

Harry Aufderheide recently said:

 I work for a large global firm in the transportation industry and we are
 taking a high-level look of our future business requirements for and the
 I.S. effort to properly handle all the characters of all the languages
 currently in use on the planet earth. 

 I have some specific questions but am interested in hearing anything related
 to work effort required ,issues, concerns, etc. First some background.

 Our operating environment includes many IBM mainframes (multiple locations),
 AS/400s, UNIX platforms, various handheld data collection devices, and a
 large number of Windows NT clients and servers. Our applications run the
 gamut including data collection, customer focus internet, marketing, sales,
 financials, package tracking,  billing, you name it we probably have it
 somewhere. Data for the most part is stored centrally on the IBM mainframes.
 Our programming languages also run the gamut including COBOL, C, C++, HTML,
 etc.

 We truly have an international presence but currently only receive data in
 English, French, Italian, German, and Spanish and, at least, some characters
 in other single byte languages. We are experiencing limited difficulties in
 properly handling all the single byte characters received. My belief is that
 this is due to program language character definition, code page, and
 EBCIDIC/ASCII differences on the various platforms. We are now "putting out
 fires" while looking for a better single byte solution and future double
 byte requirements.


 Based on everything that I have read the UNICODE standard is the way to go;
 hence my questions.

 1. Is the UTF-8's character set equal to the Latin-1 (ASCII) Code Page's? If
 not, what are the differences?
   Under the assumption that it is substantially the same; I don't see
 it solving our problems
   as we are currently processing more characters than this can
 support. It certainly doesn't 
   appear a solution for handling Chinese, Japanese, etc.
   
   This leads me to the UTF-16 format with its double byte capability.

 2. I have read a good deal of material on support of UNICODE (UTF-x)on many
 platforms but have 
not found much about the mainframe (EBCIDIC) environment other than DB2
 support for UNICODE.
Assuming that we will have the need to process characters that require
 double byte technology
and assuming that we have already done a good job of internationalizing
 our applications

I have an interest in this sort of information too.

The first question may be which versions of DB2 are in use.
I think DB2 OS/400 supports CCSID 13488 UCS-2 Level 1 (UCS-2 is UTF-16
restricted to plane zero. It might manage UTF-16 too without too much effort.)
I'm not sure whether DB2 on other platforms spports this CCSID.

UTF-16 is a character set that uses two bytes, but I don't think that
is quite the same as an IBM double byte character set (DBCS).

I'm know very little about IBM DBCS, but the impression I have is that
there are Shift In and Out control characters that swap between
single and double byte modes.

UTF-16 is modeless and is always two bytes.

Could an IBMer shed light on the following:
Do IBM DBCS strings assume starting in single byte mode?
And would the presence of certain bytes in UTF-16 trigger a switch from
double to single byte mode?

IBM have defined UTF-EBCDIC. (Details available as a technical report on
www.unicode.org) This converts Unicode characters into a variable number of
bytes in a similar way that UTF-8 does. The basic letters A-Z and digits 0-9
are mapped to their corresponding EBCDIC codes. This means that when these
particular characters are stored on an EBCDIC platform they are readable in
that format. Other characters are mapped to sequences of non-control codes.
This allows them to be shown on a terminal as wierd looking sequences of
characters, but ones which won't send any wierd control codes to the
terminal.

Although UTF-EBCDIC exists I have not seen much sign of support for it.
For example, is it possible to print UTF-EBCDIC on a mainframe printer?
Can any terminals show it? (Or terminal emulators on PCs.)

At the moment UTF-EBCDIC seems to be of most use if you want to use the
mainframe as a database server and translate into UTF-16 or UTF-8 when
talking to the outside world. (A simple translation program would be
needed.)

I see the need, across all platforms, for:

   - redesigning many of our files
Extra length may be needed for some fields.

   - making program changes specific to these physical changes (file
 layouts, working storage,
 user interfaces) 
   - modifying all logic operating on text (string) data

Sorting and string comparison can be complex (this is due to the complexities
of people's sorting needs, not anything inherent in Unicode.)

Regards,

   Tim


-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer