Re: [increasingly OT--but it's Saturday night] Re: Unicode HTML, download

2004-11-21 Thread Doug Ewell
E. Keown k underscore isoetc at yahoo dot com wrote:

 What's the point, really, of going far beyond, even
 beyond CSS, into XHTML, where few computational
 Hebraists have gone before?

 Sorry, but I think this stuff is the least interesting
 thing one can do on a computer(no offense).  Well,
 COBOL was my worst experience so far...

You are right.  There shouldn't be any need to resort to fancy tricks,
or even XHTML (which is by no means fancy), just to display Hebrew
properly on a variety of browsers.  That was your original question.

I think the most important thing, if you want to ensure correct
operation on as many platforms as possible, is to validate your HTML
using the W3C Markup Validation Service:

http://validator.w3.org/

That will keep you from accidentally using browser-specific tricks and
ensure that your HTML is clean.  Most browsers will behave correctly
when handed clean HTML.

Beyond that, you might want to specify a font family using CSS (doesn't
have to be in a separate CSS file, either) to improve the odds that the
reader will see Hebrew instead of hollow boxes, but this is optional.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/





Re: [increasingly OT--but it's Saturday night] Re: Unicode HTML, download

2004-11-21 Thread Christopher Fynn
Doug Ewell wrote:
Beyond that, you might want to specify a font family using CSS (doesn't
have to be in a separate CSS file, either) to improve the odds that the
reader will see Hebrew instead of hollow boxes, but this is optional.
While we are on the (off) topic of HTML, browsers etc. 
I've noticed, that with Windows and IE, - when going to a page with 
characters for a script for which fonts are not installed my system, IE 
will sometimes ask whether or not I want to download  install fonts for 
that script from Microsoft's web site.
This only happens in some cases - even where the same script is 
involved. I've looked the source of some of these pages but I've never 
been able to identify just what what triggers this. Does anyone know?

- Chris



Re: [even more increasingly OT-- into Sunday morning] Re: Unicode HTML, download

2004-11-21 Thread Michael \(michka\) Kaplan
From: Stefan Persson [EMAIL PROTECTED]

 I haven't used M$ IE for many years, though, and my
 memory might be wrong.

Blinded by the misspelling of the product name, maybe? :-)

See http://msdn.microsoft.com/msdnmag/issues/0700/localize/ and the section
entitled Choosing Character Sets for info on what is going on here,
particularly firgures 3 and 4 for info on how to script the behavior for the
UTF-8 case

MichKa [MS]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Windows International Division




Re: [even more increasingly OT-- into Sunday morning] Re: Unicode HTML, download

2004-11-21 Thread John Cowan
Michael (michka) Kaplan scripsit:

  I haven't used M$ IE for many years, though, and my
  memory might be wrong.
 
 Blinded by the misspelling of the product name, maybe? :-)

No, that's just a glyph difference.  :-)

 See http://msdn.microsoft.com/msdnmag/issues/0700/localize/ and the section
 entitled Choosing Character Sets for info on what is going on here,
 particularly firgures 3 and 4 for info on how to script the behavior for the
 UTF-8 case

Nice article, though it's obnoxious that the figures will only open
in a pop-up window.  

-- 
Ambassador Trentino: I've said enough. I'm a man of few words.
Rufus T. Firefly: I'm a man of one word: scram!
--Duck Soup John Cowan [EMAIL PROTECTED]



Re: [even more increasingly OT-- into Sunday morning] Re: Unicode HTML, download

2004-11-21 Thread Philippe Verdy
From: Christopher Fynn [EMAIL PROTECTED]
I'd also like to figure out a way to trigger this kind of behavior  in 
other browsers as well as in IE (using Java Script or Java rather than VB) 
as not quite everyone uses IE - (but I guess you are not going to give me 
any more clues on how to do that :-) )
If only there was a portable way to determine in JavaScript that a string 
can be rendered with the existing fonts, or to enumerate the installed fonts 
and get some of their properties... we could prompt the user to install some 
fonts or change their browser settings, or we could autoadapt the CSS style 
rules, notably the list of fonts inserted in the font-family: or 
abbreviated font: CSS properties...

There are limited controls with the CSS @ keys that allow building 
virtual font names, but not enough to tune the font selections by script 
or by code point ranges. And Javascript is of little help to paliate.
Certainly there's a need to include in a refined standard DOM for styles the 
properties needed to manage prefered font stacks associated to a virtual 
font name (for example, in a way similar to what Java2D v1.5 allows), that 
can then be referenced directly within legacy HTML font name=virtualname 
or in CSS font-family: virtualname properties (some examples of virtual 
font names are standardized in HTML: serif, sans-serif, monospace; 
Java2D or AWT adds dialog and dialoginput; but other virtual names could 
be defined as well like decorated or handscript or ocr).

The key issue here is to create documents that refer to font families 
according to their usage rather than their exact appearance and the limited 
set of languages and scripts they support.

Another possibility would be to create a portable but easily tunable font 
format (XML based? so that they can be created or tuned by scripting through 
DOM?) which would be a list of references to various external but actual 
fonts or glyph collections, and parameters to allows selecting in them with 
various priorities. For now this is not implemented in font technologies 
(OpenType, Graphite, ...) but within vendor-specific renderer APIs (than 
contain some rules to create such font mappings).




Re: [increasingly OT--but it's Saturday night] Re: Unicode HTML, download

2004-11-21 Thread E. Keown
 Elaine Keown
 Seattle (only 11 hours now...)

Dear Doug Ewell, fantasai and List:

I will try to sort out these diverse pieces of advice.

What's the point, really, of going far beyond, even
beyond CSS, into XHTML, where few computational
Hebraists have gone before?  

Sorry, but I think this stuff is the least interesting
thing one can do on a computer(no offense).  Well,
COBOL was my worst experience so far...

I've partly learned CSS, I guess---elegant placement
options!!--much better than HTML (clunky).  But I
discovered that the Web is full of bad CSS, even by
supposed gurus, and they never tell you which
browser/operating system/whatever their code might be
good for.

EK



__ 
Do you Yahoo!? 
The all-new My Yahoo! - Get yours free! 
http://my.yahoo.com 
 




Re: [even more increasingly OT-- into Sunday morning] Re: Unicode HTML, download

2004-11-21 Thread Michael \(michka\) Kaplan
This is JScrript tags in HTML -- client side script.

I do not if other browsers have solutions for this problem?

Michael

- Original Message - 
From: Christopher Fynn [EMAIL PROTECTED]
Cc: Michael (michka) Kaplan [EMAIL PROTECTED]; Unicode List
[EMAIL PROTECTED]
Sent: Sunday, November 21, 2004 7:49 AM
Subject: Re: [even more increasingly OT-- into Sunday morning] Re: Unicode
HTML, download


 Thanks Michael

 This is useful information. Unfortunately I usually need to use static
 HTML - so I can't use the ASP parts.  It would be nice see something
 like this working on UTF-8 encoded web pages where lang  is defined. In
 most cases knowing the text is a specific language and knowing the page
 is Unicode would let you know which script is being used.

 I'd also like to figure out a way to trigger this kind of behavior  in
 other browsers as well as in IE (using Java Script or Java rather than
 VB)  as not quite everyone uses IE - (but I guess you are not going to
 give me any more clues on how to do that :-) )

 regards

 - Chris



 Michael (michka) Kaplan wrote:

  From: Stefan Persson [EMAIL PROTECTED]

 I haven't used M$ IE for many years, though, and my
 memory might be wrong.

 
  Blinded by the misspelling of the product name, maybe? :-)

  See http://msdn.microsoft.com/msdnmag/issues/0700/localize/ and the
section
  entitled Choosing Character Sets for info on what is going on here,
  particularly firgures 3 and 4 for info on how to script the behavior for
the
  UTF-8 case

  MichKa [MS]
  NLS Collation/Locale/Keyboard Technical Lead
  Globalization Infrastructure, Fonts, and Tools
  Windows International Division








Re: Unicode HTML, download

2004-11-21 Thread Peter Kirk
On 21/11/2004 00:05, Edward H. Trager wrote:
...
   A better CSS class would additionally specify the font-family, 
   for example, something like the SIL Ezra font
   (http://scripts.sil.org/cms/scripts/page.php?site_id=nrsiid=EzraSIL_Home)

(4) Since your readers may not have certain fonts, In the case of legally
downloadable fonts like SIL Ezra, I would definitely put a link to the
download site so readers can download the (Hebrew) fonts if they need it to view
your page.
 

Please don't use SIL Ezra for such purposes, which is a legacy encoded 
and visually ordered Hebrew font, and is not rendered correctly in IE6. 
Instead, please use Ezra SIL, which is basically the same outlines but 
properly Unicode encoded. The URL given is for Ezra SIL, and it is a 
free download.

By the way, this font mostly works fine with any Windows (95+) system. 
Office 2003 is required only for ideal placement of certain accents etc.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Unicode HTML, download

2004-11-21 Thread Peter Kirk
On 21/11/2004 15:28, Philippe Verdy wrote:
From: Peter Kirk [EMAIL PROTECTED]
On 21/11/2004 00:50, Philippe Verdy wrote:
...
style type=text/css!--
.he {
   font-family: SIL Ezra, Arial Unicode MS, David, Myriam, 
Tahoma, Arial, sans-serif;
   direction: rtl;
}

This will absolutely NOT work because SIL Ezra is legacy encoded and 
the others are Unicode encoded. You should be using Ezra SIL. See my 
previous posting.

Thanks for this correction. I thought that this font was Unicode too...

Please read my earlier posting. Of course it does make things rather 
difficult that none of my postings ever get approved on a Sunday, 
especially when I am trying to correct seriously misleading factual errors.

But this creates an even more complicate case for creating a portable 
HTML page: as the font uses a specific encoding, how can characters be 
selected in that font, given that the page will be UTF-8 encoded and 
thus will contain numeric references to Unicode code points?

Does this font works as if it was assigning ISO-8859-1 characters? If 
so, Elaine will need to use only Latin-1, which will be correctly 
rendered as expected only if the specific font is installed. If it is 
not, readers will see Latin-1 characters, but not even any Hebrew 
character present in most classic core fonts of their browser...

If you really want to know, the font SIL Ezra (which was never intended 
for Unicode use) uses PUA characters F020 to F0FF only. It is totally 
unsuitable for web use because it uses some of these PUA characters as 
combining marks, and this usage is not supported (for some reason which 
has never been explained) by the world's most popular browser (although 
it was supported by previous versions, hence breaking a large number of 
existing web pages using legacy encodings for Hebrew, Greek etc with 
diacritics). So please don't even think of how to trick browsers into 
using SIL Ezra - which would also require support for visual encoding.

So if she really wants to include character compositions which are 
only possible with Ezra SIL, she will need these two classes:

style type=text/css!--
.he { font-family: Arial Unicode MS, David, Myriam, Tahoma,  Arial, 
sans-serif;}
.heb { font-family: Ezra SIL }
.he, .heb { direction: rtl; }
//--/style

No problem if you are using Ezra SIL, which is a different font from SIL 
Ezra, and is Unicode mapped and so can be mixed with the others you mention.

...
I still doubt that you need such a specialized font for Biblic Hebrew 
and Canaanite languages, to create a technical translation glossary, 
which would probably use modern Hebrew only (so the he class above 
would probably be enough...)

David is a very adequate font for Hebrew with consonants and vowel 
points, as long as accents are not required - and Elaine is very 
unlikely to require them. Times New Roman is fine for unpointed 
consonantal Hebrew only as its Holam point is unfortunately broken. 
Arial and Arial Unicode MS are probably OK for modern Hebrew but look 
odd to those of us more used to the ancient language - and their Holam 
is also broken. Miriam doesn't look good at all, to me.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Unicode HTML, download

2004-11-21 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 So if she really wants to include character compositions which are
 only possible with Ezra SIL, she will need these two classes:

 style type=text/css!--
 .he { font-family: Arial Unicode MS, David, Myriam, Tahoma,  Arial,
 sans-serif;}
 .heb { font-family: Ezra SIL }
 .he, .heb { direction: rtl; }
 //--/style

 and use preferably the he class name for all Hebrew characters which
 can be represented with Unicode code points and Unicode fonts found in
 common browsers, surrounding only the specific sections requiring the
 SIL encoding mapped on ISO-8859-1 within span class=heb elements.

Absolutely not.  No way.  A document should NEVER contain text in two or
more character encodings with changes indicated only by font
suggestions.

This approach will destroy searching capabilities, and will not ensure
proper rendering in any event.  The user who has Miriam but not Ezra SIL
(or vice versa) will see some Hebrew text rendered properly and some
improperly, for no apparent reason.  This is worse than either the
all-Unicode or all-Ezra approach.  Don't do it, Elaine.

The only time a document should EVER be presented in mixed encodings is
for direct illustration of encoding issues (intended for Unicode
weenies) or in a MIME-like setting where the document is divided into
logical sections, with the encoding of each section clearly indicated.
This is true for all types of documents, not just Web pages.

If Elaine suspects that some of her HTML will not be displayed properly
with commonly available Unicode fonts, she will have to bite the bullet
and either:

(a) code the whole page in Unicode, and provide a link to a
comprehensive-enough Hebrew Unicode font, OR

(b) code the whole page in the legacy encoding, and provide a link to
Ezra SIL.

Cryptically naming these two CSS classes .he and .heb, which
provides no indication of which is the Unicode encoding and which is the
Latin-1 hack, merely makes a bad suggestion worse.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/






Re: Unicode HTML, download

2004-11-21 Thread Doug Ewell
Peter Kirk peterkirk at qaya dot org wrote:

 Please don't use SIL Ezra for such purposes, which is a legacy encoded
 and visually ordered Hebrew font, and is not rendered correctly in
 IE6. Instead, please use Ezra SIL, which is basically the same
 outlines but properly Unicode encoded. The URL given is for Ezra SIL,
 and it is a free download.

This makes things a little clearer: Philippe's bad advice to mix
encodings was based on bad information, that doing so would be
necessary.

The best advice for Elaine's situation becomes simpler.  To maximize the
likelihood that readers will see the right glyphs, add a font-family
style line that lists a variety of available fonts, in decreasing order
of coverage and attractiveness.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/





Font selection, font downloads, and (writing system) scripts

2004-11-21 Thread fantasai
This discussion belongs on www-style, so setting Reply-To to there.
Philippe, could you explain what you meant by
 The key issue here is to create documents that refer to font families
 according to their usage rather than their exact appearance and the
 limited set of languages and scripts they support.
?
~fantasai
Philippe Verdy wrote:
From: Christopher Fynn [EMAIL PROTECTED]
...
Christopher Fynn wrote:
I've noticed, that with Windows and IE, - when going to a page with
characters for a script for which fonts are not installed my system, IE will
sometimes ask whether or not I want to download  install fonts for that script
from Microsoft's web site.
This only happens in some cases - even where the same script is involved.
I've looked the source of some of these pages but I've never been able to
identify just what what triggers this. Does anyone know?
...
I'd also like to figure out a way to trigger this kind of behavior  in 
other browsers as well as in IE (using Java Script or Java rather than 
VB) as not quite everyone uses IE - (but I guess you are not going to 
give me any more clues on how to do that :-) )

If only there was a portable way to determine in JavaScript that a 
string can be rendered with the existing fonts, or to enumerate the 
installed fonts and get some of their properties... we could prompt the 
user to install some fonts or change their browser settings, or we could 
autoadapt the CSS style rules, notably the list of fonts inserted in the 
font-family: or abbreviated font: CSS properties...

There are limited controls with the CSS @ keys that allow building 
virtual font names, but not enough to tune the font selections by 
script or by code point ranges. And Javascript is of little help to 
paliate.
Certainly there's a need to include in a refined standard DOM for styles 
the properties needed to manage prefered font stacks associated to a 
virtual font name (for example, in a way similar to what Java2D v1.5 
allows), that can then be referenced directly within legacy HTML font 
name=virtualname or in CSS font-family: virtualname properties 
(some examples of virtual font names are standardized in HTML: serif, 
sans-serif, monospace; Java2D or AWT adds dialog and 
dialoginput; but other virtual names could be defined as well like 
decorated or handscript or ocr).

The key issue here is to create documents that refer to font families 
according to their usage rather than their exact appearance and the 
limited set of languages and scripts they support.

Another possibility would be to create a portable but easily tunable 
font format (XML based? so that they can be created or tuned by 
scripting through DOM?) which would be a list of references to various 
external but actual fonts or glyph collections, and parameters to allows 
selecting in them with various priorities. For now this is not 
implemented in font technologies (OpenType, Graphite, ...) but within 
vendor-specific renderer APIs (than contain some rules to create such 
font mappings).




Re: Unicode HTML, download

2004-11-21 Thread Doug Ewell
I wrote:

 The best advice for Elaine's situation becomes simpler.  To maximize
 the likelihood that readers will see the right glyphs, add a font-
 family style line that lists a variety of available fonts, in
 decreasing order of coverage and attractiveness.

Actually, of course, the only way to *guarantee* that readers will see
the right glyphs is to chuck HTML altogether and create a PDF file.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/





Re: Font selection, font downloads, and (writing system) scripts

2004-11-21 Thread Rick McGowan
Fantasai wrote,

 This discussion belongs on www-style, so setting Reply-To to there.

And if you're going to do that then, as a matter of etiquette, please  
don't CC the Unicode list.

When you CC the Unicode list and some other list, people on the other list  
may try to reply all and include both lists. For hot topics, this can  
result in a cross-posting mess and people seeing half the story. And some  
people may get you can't post here because you're not subscribed  
messages.

Thanks,

Rick





Re: Unicode HTML, download

2004-11-21 Thread Philippe Verdy
From: Doug Ewell [EMAIL PROTECTED]
The best advice for Elaine's situation becomes simpler.  To maximize the
likelihood that readers will see the right glyphs, add a font-family
style line that lists a variety of available fonts, in decreasing order
of coverage and attractiveness.
My bad advice comes effectively from the confusion about two SIL related 
fonts: one with legacy encoding (handled in browsers as if it was ISO-8859-1 
encoded, so that you need to insert text in the HTML page using only the 
code points in the Latin-1 page starting at U+, even though they do not 
represent the correct Unicode characters), and the other coded with Unicode 
(for which you need to encode your text with Habrew code points...).

But your advice, Doug, still won't work when multiple fonts in the 
font-family style use distinct encodings: Mixing SIL Ezra with Arial, or 
similar Unicode encoded fonts will never produce the intended fallbacks if 
users don't have SIL Ezra effectively installed and selectable in their 
browser environment.

Legacy encoded fonts only contain a codepage/charset identifier (most often 
ISO-8859-1) and no character to glyph translation table; also don't work 
properly with browsers configured for accessibility, where only the 
user-defined prefered fonts are allowed, and fonts specified in HTML pages 
must be ignored by the browser, user styles having been set to higher 
priority (even if one uses the important (!) CSS style rule markers), 
unless the default font mapping associated with the codepage/charset 
identifier effectively corresponds to what would be found in a regular 
char-to-glyph mapping table present in that font.




Re: Unicode HTML, download

2004-11-21 Thread Philippe Verdy
From: Doug Ewell [EMAIL PROTECTED]
Cryptically naming these two CSS classes .he and .heb, which
provides no indication of which is the Unicode encoding and which is the
Latin-1 hack, merely makes a bad suggestion worse.
It was not cryptocraphic: he was meant for Hebrew (generic, properly 
Unicode encoded, suitable for any modern Hebrew), and heb for Biblic 
Hebrew where a legacy encoding may still be needed, in absence of workable 
Unicode support for now: this won't be the same language however, so a 
change of encoding may be justified. I was not advocating for mixing 
encodings within the same text for the same language...

But I was nearly sure that a technical jargon in Hebrew would probably not 
need Biblic Hebrew, except for illustration purpose within small delimited 
block quotes or spans, where there will be simultaneously changes of:
- language level
- needed character set, some characters not being encodable with Unicode
- a needed changed encoding (from Unicode to Latin-1 override hack)
- specific font to render the legacy encoding.
In that case, it is acceptable to have the general text in modern Hebrew 
properly coded with Unicode, even if the small illustrative quotes remain 
fully in a non standard mapping, and won't appear correctly without the 
necessary font.

Note that PDF files DO mix encodings within the embedded fonts that PDF 
writers dynamically create for only the necessary glyphs. These encodings 
are specific to the document, for each embedded font... This is why PDF 
files can encode text that still don't have Unicode character mappings. You 
can see that when you attempt to copy/paste text fragments from PDF files in 
sections using embedded fonts; the pasted text will not reproduce the same 
characters as what you can see in the PDF reader; copy/pasting however works 
for PDF files using external fonts with standard mappings. 




Re: Ezra

2004-11-21 Thread Edward H. Trager
On Sunday 2004.11.21 14:38:09 +, Peter Kirk wrote:
 On 21/11/2004 00:05, Edward H. Trager wrote:
 
 ...
 
A better CSS class would additionally specify the font-family, 
for example, something like the SIL Ezra font
(http://scripts.sil.org/cms/scripts/page.php?site_id=nrsiid=EzraSIL_Home)
 
 (4) Since your readers may not have certain fonts, In the case of legally
 downloadable fonts like SIL Ezra, I would definitely put a link to the
 download site so readers can download the (Hebrew) fonts if they need 
 it to view
 your page.
  
 
 
 Please don't use SIL Ezra for such purposes, which is a legacy encoded 
 and visually ordered Hebrew font, and is not rendered correctly in IE6. 
 Instead, please use Ezra SIL, which is basically the same outlines but 
 properly Unicode encoded. The URL given is for Ezra SIL, and it is a 
 free download.

Are you saying the difference in names is SIL Ezra vs. Ezra SIL ?
That's too confusing!

  When I gave the URL, I checked that it was referring
to an OpenType font --and since it was OpenType, I assumed that it was the
newer version with a Unicode CMAP.  If SIL still has links to legacy non-Unicode
versions of fonts which now also have Unicode versions, then they should make
this really clear to people.  My apologies if I provided the wrong URL.

 
 By the way, this font mostly works fine with any Windows (95+) system. 
 Office 2003 is required only for ideal placement of certain accents etc.
 
 
 -- 
 Peter Kirk
 [EMAIL PROTECTED] (personal)
 [EMAIL PROTECTED] (work)
 http://www.qaya.org/
 
 
 
 



Re: [increasingly OT--but it's Saturday night] Re: Unicode HTML, download

2004-11-21 Thread Edward H. Trager
On Sunday 2004.11.21 00:06:31 -0800, Doug Ewell wrote:
 E. Keown k underscore isoetc at yahoo dot com wrote:
 
  What's the point, really, of going far beyond, even
  beyond CSS, into XHTML, where few computational
  Hebraists have gone before?
 
  Sorry, but I think this stuff is the least interesting
  thing one can do on a computer(no offense).  Well,
  COBOL was my worst experience so far...
 
 You are right.  There shouldn't be any need to resort to fancy tricks,
 or even XHTML (which is by no means fancy), just to display Hebrew
 properly on a variety of browsers.  That was your original question.
 

I beg to differ with Doug Ewell here: Using XHTML and some very basic
CSS1 is not, in my opinion, resorting to fancy tricks.  XHTML is very
simple to do correctly, and more consistent than HTML 4.01.  Philip Verdy
also provided some good advice on what a CSS class for Hebrew might look
like.  

XHTML has a consistent set of rules that apply across all tags : I would
argue that this is *easier* to learn and stick to than old-style HTML.
And proper use of CSS really allows one to separate one's content from
the display of that content.  For me, the combination of XHTML and CSS
is so much easier than what I used to suffer through in the bad old days
of HTML before CSS came along ...

I do agree with Doug that validation using the W3C.org or similar validator
is absolutely essential.  But this thread is getting off-topic.  The intent of 
my
original post was merely to suggest Elaine take a look at using XHTML, CSS,
and UTF-8 for her documents. 

 I think the most important thing, if you want to ensure correct
 operation on as many platforms as possible, is to validate your HTML
 using the W3C Markup Validation Service:
 
 http://validator.w3.org/
 
 That will keep you from accidentally using browser-specific tricks and
 ensure that your HTML is clean.  Most browsers will behave correctly
 when handed clean HTML.
 
 Beyond that, you might want to specify a font family using CSS (doesn't
 have to be in a separate CSS file, either) to improve the odds that the
 reader will see Hebrew instead of hollow boxes, but this is optional.
 
 -Doug Ewell
  Fullerton, California
  http://users.adelphia.net/~dewell/
 
 
 
 
 



Re: [increasingly OT--but it's Saturday night] Re: Unicode HTML, download

2004-11-21 Thread Philippe Verdy
From: E. Keown [EMAIL PROTECTED]
Dear Doug Ewell, fantasai and List:
I will try to sort out these diverse pieces of advice.
What's the point, really, of going far beyond, even
beyond CSS, into XHTML, where few computational
Hebraists have gone before?
You're right Helen, the web is full of non XHTML conforming documents. You 
probably don't need full XHTML conformance too, but having your document 
respect the XML nesting and closure of elements is certainly a must today, 
because it avoids most interoperability problems in browsers.

So: make sure all your HTML elements and attributes are lowercase, and close 
ALL elements (even empty elements that should be closed by  / instead of 
just , for example br / instead of br, and even li.../li, or 
p.../p).
And then don't embed structural block elements
   (like p.../p or div../div or blockquote.../blockquote
   or li.../li or table.../table)
within inline elements
   (like b.../b or font.../font or a href=../a
   or span.../span)
Note that most inline elements are related to style, and they better fit 
outside of the body by assigning style classes to the structural elements 
(most of them are block elements).

XHTML has deprecated most inline style elements, in favor of external 
specification of style through the class property added to structural block 
elements. XHTML has an excellent interoperability with a wider range of 
browsers, including old ones, except for the effective rendering of some CSS 
styles.

The cost to convert an HTML file to full XML well-formedness is minor for 
you, but this allows you to use XML editors to make sure the document is 
properly nested, a pre-condition that will greatly help its interoperable 
interpretation.

If you have FrontPage XP or 2003, you can use its apply XML formatting 
rules option to make this job nearly automatically, and make sure that all 
elements are properly nested and closed.




Re: Ezra

2004-11-21 Thread Philippe Verdy
From: Edward H. Trager [EMAIL PROTECTED]
Are you saying the difference in names is SIL Ezra vs. Ezra SIL ?
That's too confusing!
You're not alone to be confused. I had completely forgotten the existence of 
two versions of the same font design. I may have just seen that it used 
PUAs, so I did not install it (I did not remember that it used PUAs, and the 
wording of the sentence that introduced it in this discussion made me think 
that it was NOT using Unicode, and thus not PUAs which are Unicode things; 
that's where I supposed it was using some legacy Latin-1 override or similar 
hacks found in some special-purpose fonts, or in legacy non-TrueType-based 
font formats, like PostScript mappings within a 0-based indexed vector or 
hashed dictionnary of glyph names...)




Re: Unicode HTML, download

2004-11-21 Thread John Cowan
Peter Kirk scripsit:

 Please read my earlier posting. Of course it does make things rather 
 difficult that none of my postings ever get approved on a Sunday, 
 especially when I am trying to correct seriously misleading factual errors.

Yr hble Hebrew Moderator attempts to work 24/7, but occasionally the need
to sleep or to engage in business (I was at a conference all last week)
or family business (a death in a friend's family) interferes with this
otherwise laudable goal.

-- 
John Cowan  [EMAIL PROTECTED]  www.ccil.org/~cowan  www.reutershealth.com
In computer science, we stand on each other's feet.
--Brian K. Reid



Re: Unicode HTML, download

2004-11-21 Thread Peter Kirk
On 21/11/2004 17:35, Doug Ewell wrote:
...
This approach will destroy searching capabilities, and will not ensure
proper rendering in any event.  The user who has Miriam but not Ezra SIL
(or vice versa) will see some Hebrew text rendered properly and some
improperly, for no apparent reason. ...
Not true of Ezra SIL, only of SIL Ezra. Sorry to keep repeating myself, 
but these errors keep being perpetuated.

...
(b) code the whole page in the legacy encoding, and provide a link to
Ezra SIL.
 

Ezra SIL does not use a legacy encoding, it is a Unicode font.
Later, Doug wrote:
This makes things a little clearer: Philippe's bad advice to mix
encodings was based on bad information, that doing so would be
necessary.
 

I had already corrected the bad information, and Philippe quoted my 
correction. He simply failed to recognise that SIL Ezra != Ezra SIL. 
Not my choice of naming conventions, but it is consistent with several 
SIL fonts: SIL xxx is a legacy encoded version, xxx SIL is the Unicode 
version of it.

Philippe wrote:
two SIL related fonts: one with legacy encoding (handled in browsers 
as if it was ISO-8859-1 encoded, so that you need to insert text in 
the HTML page using only the code points in the Latin-1 page starting 
at U+, even though they do not represent the correct Unicode 
characters) ...

More bad information. As I already wrote, SIL Ezra is encoded in the PUA 
and not as if it was ISO-8859-1 encoded. So this technique will not work.

Mixing SIL Ezra with Arial, or similar Unicode encoded fonts will 
never produce the intended fallbacks if users don't have SIL Ezra 
effectively installed and selectable in their browser environment.
Mixing SIL Ezra with Arial, or similar Unicode encoded fonts, is A BAD 
THING. Period. Don't even think of trying it, especially in HTML. 
Instead, use Ezra SIL. And use Times New Roman rather than Arial as the 
fallback because it looks much more similar - or David, which is less 
similar but gives better results.


--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Unicode HTML, download

2004-11-21 Thread Peter Kirk
On 21/11/2004 22:23, Philippe Verdy wrote:
From: Doug Ewell [EMAIL PROTECTED]
Cryptically naming these two CSS classes .he and .heb, which
provides no indication of which is the Unicode encoding and which is the
Latin-1 hack, merely makes a bad suggestion worse.

It was not cryptocraphic: he was meant for Hebrew (generic, properly 
Unicode encoded, suitable for any modern Hebrew), and heb for Biblic 
Hebrew where a legacy encoding may still be needed, in absence of 
workable Unicode support for now: ...

A good point, Philippe. Modern and biblical Hebrew are slightly 
different languages, and in principle may need different encodings. 
There are still some small holes in Unicode support for biblical Hebrew, 
most of which will be plugged (in some kind of way) when the current 
pipeline empties itself. (Sorry for mixing my liquid container 
metaphors.) But the current results of displaying biblical Hebrew in 
browsers, at least on Windows, are already much better with Unicode than 
with the legacy encoding, because at least IE6 converts all legacy 
encoded combining marks into spacing marks. Think what French would look 
like if every accent were spacing, and then think much worse for Hebrew 
because almost every base character has one or more combining mark.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Ezra

2004-11-21 Thread Peter Kirk
On 21/11/2004 23:02, Edward H. Trager wrote:
...
Are you saying the difference in names is SIL Ezra vs. Ezra SIL ?
That's too confusing!
 

Confusing, but true.
 When I gave the URL, I checked that it was referring
to an OpenType font --and since it was OpenType, I assumed that it was the
newer version with a Unicode CMAP.  If SIL still has links to legacy non-Unicode
versions of fonts which now also have Unicode versions, then they should make
this really clear to people.  My apologies if I provided the wrong URL.
 

No, you provided the right URL, but the wrong font name.
For reference:
For the Unicode font Ezra SIL, go to 
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsiid=EzraSIL_Home. 
Do use this for web pages, but Hebrew accents may not display properly 
on some systems.

For the legacy font SIL Ezra, go to 
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsiitem_id=SILEzra. 
Don't use this for web pages because it doesn't work with IE6. The 
situation is clearly explained at this URL.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Unicode HTML, download

2004-11-21 Thread Asmus Freytag
At 11:10 AM 11/21/2004, Doug Ewell wrote:
Actually, of course, the only way to *guarantee* that readers will see
the right glyphs is to chuck HTML altogether and create a PDF file.

And that's a task that needs to be approached with some care as well.
The UTC and WG2 constantly get PDF documents with all the interesting
glyphs trashed in them.
For the code charts, I have long given up on embedding fonts and am
using a two step process of creating a PS file and using distiller.
For the PS driver I select convert TT fonts to outline or similar
settings which extracts and embeds the specific outline information.
I disable all embedding.
That makes for poorer font quality at small magnification, but absolutely
guarantees that what I put together is what people see. So far, that
has worked well for the purpose.
A./ 




Re: Unicode HTML, download

2004-11-21 Thread Doug Ewell
Peter Kirk peterkirk at qaya dot org wrote:

 A good point, Philippe. Modern and biblical Hebrew are slightly
 different languages, and in principle may need different encodings.

English and Russian and Chinese and Hebrew are *very* different
languages, and that still does not justify the confusion of using
different encodings for each within the same document.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/





Re: Unicode HTML, download

2004-11-21 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 But your advice, Doug, still won't work when multiple fonts in the
 font-family style use distinct encodings: Mixing SIL Ezra with Arial,
 or similar Unicode encoded fonts will never produce the intended
 fallbacks if users don't have SIL Ezra effectively installed and
 selectable in their browser environment.

Don't use multiple fonts in the same font-family style that use
different encodings.  That way lies madness.

 It was not cryptocraphic: he was meant for Hebrew (generic, properly
 Unicode encoded, suitable for any modern Hebrew), and heb for Biblic
 Hebrew where a legacy encoding may still be needed, in absence of
 workable Unicode support for now: this won't be the same language
 however, so a change of encoding may be justified. I was not
 advocating for mixing encodings within the same text for the same
 language...

Don't mix encodings within the same text REGARDLESS of the languages
involved.  If Unicode support (meaning font and rendering-engine
support) is inadequate for one of the languages, then the same
non-Unicode encoding should be used for the whole document.

Documents that used different 8-bit encodings for French and Russian, or
French and Hebrew, or whatever, were central to the ISO 2022-based chaos
of the 1980s.  Rendering these properly was difficult and painful.
Let's not start recommending that path again.

I do see your logic in choosing he and heb, but heb looks like it
could also stand for just Hebrew.  In fact, he and heb are
actually the ISO 639 alpha-2 and alpha-3 codes, respectively, for
Hebrew, with no difference in meaning.  Class names (or other
identifiers) should not be so short that they become, well, cryptic.
hebrew and biblical are possible class names that might be more
easily recognized.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/