Re: Browser support

2002-04-01 Thread Lars Marius Garshol


* Herman Ranes
| 
| My observation is that Opera6.0, MSIE6.0 and Mozilla0.9.8(Win)
| interpret not only Win-1252 -tagged 8-bit HTML as Win-1252, but that
| they interpret also US-ASCII and ISO-8859-1 -tagged 8-bit HTML as
| Win-1252.

* Michael Kaplan
| 
| It is highly doubtful that they are supporting 1252 specifically.
| They are probably using CP_ACP (the default system code page).

In the case of Opera this was a conscious decision. We had the choice
between being correct and showing people lots of square boxes where
people (or software) had mislabeled pages, or we could 'cheat' a
little and don't hurt anyone. It was an easy choice, really.

A browser with a market share like MSIE can hope to educate people
(although it will probably be more successful in making them switch
browsers), but a browser in the position Opera is currently in can
just forget about that.

Many users see any display difference between MSIE and Opera as an
Opera 'bug' and are thoroughly resistant to arguments of the form
"this happens because Opera supports standard X correctly, while MSIE
does not".

-- 
Lars Marius Garshol, Ontopian http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TChttp://www.garshol.priv.no >





Re: Browser support

2002-03-20 Thread Jungshik Shin

On Wed, 20 Mar 2002, David Starner wrote:

> On Wed, Mar 20, 2002 at 10:31:41AM +0100, Herman Ranes wrote:
> > Why did Mozilla introduce this 'sloppy' practice in their newer 
> > versions ... ?
> 
> Because its users were getting tired of dealing with little boxes where
> quotes should be, and it was easier to change it at the browser level than
> the web level. I can't imagine anything that it would break.

  I second this. I don't see much harm done by rendering html pages
in Windows-1252 (but mislabeled as ISO-8859-1 or even US-ASCII) as
intended (that is, as Windows-1252). Well, by adhering to the definition
of ISO-8859-1 and rendering chars. outside it (represented NOT in NCR BUT
in its Windows-1252 binary representation. If they're in NCR, Mozilla does
and should render them no matter what encoding/MIME charset is used in
html docs) as '?',   Mozilla can try to 'educate' people(web page authors)
about what the correct MIME charset name to use for Windows-1252 pages,
but before it achieves anything in this direction, people will simply
dismiss it as not working as well as its competitors (e.g. MS IE) and
stick to them.

  There are many cases like Windows-1252 vs ISO-8859-1. One such example
is X-Windows-949 (perhaps intentionally - to hide the fact that CP949
is a proprietary extension of its own invention rather than a result of
abiding by Korean standard - and mistakenly labeled as ks_c_5601-1987
by MS products) vs EUC-KR.

  Jungshik Shin





Re: Browser support

2002-03-20 Thread David Starner

On Wed, Mar 20, 2002 at 10:31:41AM +0100, Herman Ranes wrote:
> Why did Mozilla introduce this 'sloppy' practice in their newer 
> versions ... ?

Because its users were getting tired of dealing with little boxes where
quotes should be, and it was easier to change it at the browser level than
the web level. I can't imagine anything that it would break.

-- 
David Starner - [EMAIL PROTECTED]
"It's not a habit; it's cool; I feel alive. 
If you don't have it you're on the other side." 
- K's Choice (probably refering to the Internet)




Re: Browser support

2002-03-20 Thread Otto Stolz

Hello all,

though Bill Kurmey wrote privately, I think this should be discussed
publicly; so I take the liberty to answer in Unicode list.

I had written:

> - When a notable fraction of your user community uses older browsers,
>   particularly Netscape 4.7:
>   - For characters contained in CP 1252, such as em-dash, trademark symbol,
> and smart quotes, choose ISO-8859-1 encoding, and use NCRs for the
> characters not in ISO-8859-1 (but in CP 1252).


Bill Kurmey wrote:

> Please, no.  There are hundreds if not thousands of web pages already
> incorrectly identified as "ISO-8859-1" and which should be identified as
> "Windows-1252" when they contain NCRs not in the range of ISO-8859-1.


Wrong presupposition.

As I had explained, the document charset is fixed: ISO 8859-1 for HTML 2
and HTML 3; UCS for HTML 4. This means
- that a HTML 4 source may legally contain NCRs from the whole UCS/Unicode
   range,
- the HTTP Content-Type/charset parameter only determines how the bytes
   transmitted to the browser are to be transformed back into characters
   (which, in due course, will be parsed, according to the HTML syntax,
   into tags, entities, NCRs, or text elements).
Cf. , for the gory details.

In the particular example,
- an HTML 4 page labelled
   
   can contain an em-dash in any of four representations:
   · Bytes 26 6D 64 61 73 68 3B: "—"
   · Bytes 26 23 38 32 31 32 3B: "—"
   · Bytes 26 23 78 32 30 31 34 3B : "—"
   · Bytes 26 23 58 32 30 31 34 3B : "—"
- an HTML 4 page labelled
   
   can contain an em-dash in any of five representations:
   · Bytes 26 6D 64 61 73 68 3B: "—"
   · Bytes 26 23 38 32 31 32 3B: "—"
   · Bytes 26 23 78 32 30 31 34 3B : "—"
   · Bytes 26 23 58 32 30 31 34 3B : "—"
   · Byte  97  : em-dash, encoded in CP 1252

I should have mentioned, that not all of the older browsers handle
hexadecadic NCRs (though Netscape 4.77 does). Hence, I recommend to
use only the decimal ones, for another couple of years.

Bill Kurmey wrote:

>  Many do not declare the version of HTML in which they were created.


The Doctype declaration is essential. I should have mentioned this, in
my previous note. HTML 4 requires a Doctype declaration, cf.
; a HTML source not
containing a Doctype declaration is always assumed to be in HTML 2.0. Hence,
the discussion above depends on a valid HTML 4 Doctype declaration.

> In the versions of Netscape 4.7x which I have tested, none handle ALL

> of the c1 control range used in CP1252 including some of the characters

> you have specified.  Netscape 4.75, for example, does correctly handle

> the smart quotes and em-dash, but the trademark symbol (and others in

> CP1252) appear as white "boxes."


A white box means that Netscape is unable to locate a suitable font to dis-
play the respective character. You can only display characters available
locally; I think I had mentioned this. You can download MS core fonts, in
Truetype format, from .
Those labelled "WGL4" contain all required characters (and more).

As it happens, all of the 27 character CP 1252 has in excess of ISO 8859-1
are mentioned in ;
this page uses decimal NCRS, such as "—".
- Netscape Communicator 4.77 under Windows 98 Vers 4.10.222 [de]
   displays 25 of them, the exceptions being the two characters
   Z with Hachek (which are, surprisingly, replaced with white boxes).
- Netscape Communicator 4.6 under Mac OS D1 - 8.6, displays 23 of them,
   the exceptions being the four characters S and Z with Hatchek.
- Netscape 4.77 [en] under Solaris 8
   · displays 14 of these,
   · displays fall-back representations for another 8 of them,
 viz. "OE", "oe", "S", "s", "Y", "EUR", "f", and "[TM]",
   · displays question marks for 5 of them
 viz. the Daggers, the Zs with Hatchek, and the Promille Sign.

So, yes, not all of the CP 1252 additions are correctly displayed by
Netscape 4.7; and I apologize for my sloppyness. But, no, Netscape 4.7
has no problem with the Trademark Symbol, nor any other of the symbols
mentioned in the original question.

Best wishes,
   Otto Stolz





Re: Browser support

2002-03-20 Thread Michael \(michka\) Kaplan

From: "Herman Ranes" <[EMAIL PROTECTED]>

> My observation is that Opera6.0, MSIE6.0 and Mozilla0.9.8(Win)
> interpret not only Win-1252 -tagged 8-bit HTML as Win-1252, but that
> they interpret also US-ASCII and ISO-8859-1 -tagged 8-bit HTML as
> Win-1252.

It is highly doubtful that they are supporting 1252 specifically. They are
probably using CP_ACP (the default system code page).


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/





Re: Browser support

2002-03-20 Thread Herman Ranes

My observation is that Opera6.0, MSIE6.0 and Mozilla0.9.8(Win) 
interpret not only Win-1252 -tagged 8-bit HTML as Win-1252, but that 
they interpret also US-ASCII and ISO-8859-1 -tagged 8-bit HTML as 
Win-1252.

However, *earlier* versions of Mozilla *did* display US-ASCII / 
ISO-8859-1 -tagged documents substituting the U+FFFD REPLACEMENT 
CHARACTER for 8-bit data in the 80-FF / 80-9F ranges.

Why did Mozilla introduce this 'sloppy' practice in their newer 
versions ... ?

-Herman Ranes


Otto Stolz skreiv:
> 
> Netscape 6.2, Internet Explorer 6.0, and Opera 6.0 comply with
> the HTML 4 character model, as outlined above.
> 



-- 
Herman RanesHøgskolen i Sør-Trøndelag
 Avdeling for teknologi
Telefon   +47 73559606  Institutt for elektroteknikk
Telefaks  +47 73559581
<[EMAIL PROTECTED]>N-7004 TRONDHEIM
http://www.hist.no/~hra/NOREG





Re: Browser support

2002-03-19 Thread Otto Stolz

Hello Stuart Somer,

you wrote:

> I find many recomendations not to use unicode characters for entities

> like em dashes trademark symbols because there is poor browser support.


According to HTML 4, <http://www.w3.org/TR/html401/charset.html#h-5.3>,
you may use any NCR (numeric character reference), or any entity, regard-
less of the encoding <http://www.w3.org/TR/html401/charset.html#h-5.2.2>.

In theory, the Document Character Set is always the Universal Character
Set (UCS, aka Unicode), <http://www.w3.org/TR/html401/charset.html#h-5.1>;
the encoding chosen is just the vehicle to transfer the characters
readily from the server to the client: the characters contained in that
set may be given in their respective binary representation, while any
character may be given as a NCR. A browser should be capable of dis-
playing all Unicode characters, provided there are suitable fonts
locally available.

In contrast to this theory, Netscape 4.7 does display only characters
that are in the encoding chosen -- with a notable exception: if the
encoding is ISO-8859-1, all CP 1252 characters can be displayed (at least
on Windows systems; I have not excessively tested Netscape on other OSes).
Cf. <http://czyborra.com/charsets/iso8859.html#ISO-8859-1>
and <http://czyborra.com/charsets/codepages.html#CP1252>,
for these character sets.

Netscape 6.2, Internet Explorer 6.0, and Opera 6.0 comply with
the HTML 4 character model, as outlined above.

Hence my recommendation:

- When your user community has Netscape 6.2, Internet Explorer 6.0, or
   Opera 6.0, use any convenient encoding, and insert characters beyond
   the chosen encoding as either NCRs or entities.

- When a notable fraction of your user community uses older browsers,
   particularly Netscape 4.7:
   - For characters contained in CP 1252, such as em-dash, trademark symbol,
 and smart quotes, choose ISO-8859-1 encoding, and use NCRs for the
 characters not in ISO-8859-1 (but in CP 1252).
   - If you need characters beyond CP 1252, choose UTF-8 encoding; depending
 on your editor (and other authoring tools), you may prefer to enter all
 characters directly, or to enter the characters beyond ASCII as 
entities
 or NCRs.

In any case, it would be wise to
- stay within the WGL4.0 Character Set,
   cf. <http://www.microsoft.com/typography/otspec/WGL4.htm>,
   as there are suitable fonts freely available,
- test your WWW-pages with all browsers popular in your user community.

> Do you know of a chart for browser support of
> unicode by browser version.

The most comprehensive discussion I've seen is
<http://www.hclrss.demon.co.uk/unicode/browsers.html>.

Best wishes,
   Otto Stolz





Browser support

2002-03-15 Thread Magda Danish (Unicode)



-Original Message-
Date/Time:Fri Mar 15 12:22:10 EST 2002

Contact:  [EMAIL PROTECTED]

Report Type:  General question

Text of the report is appended below:

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

I am trying to establish text style guides for new web site and wish to
avoid awkward HTML fixes such as three hyphens for an em dash. I find
many recomendations not to use unicode characters for entities like em
dashes trademark symbols because there is poor browser support. In my
testing and research it looks like there is good browser support for em
dash. I would like it to render correctly in IE  and Netscape 4+ on Macs
and Windows machines. Do you know of a chart for browser support of
unicode by browser version. thanks, Stuart Somer


-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
(End of Report)