Re: Unicode code page and ☃.net

2013-07-30 Thread Jukka K. Korpela

2013-07-30 23:50, James Lin wrote:


If you open the Windows character Map, "Segoe UI" doesn't contain the
snowman while font Meiryo has.


I wrote about Segoe UI Symbol, not Segoe UI.

Meiryo, which is also shipped with Windows 7, indeed contains SNOWMAN. 
This makes it even more odd if SNOWMAN is rendered as empty (in some 
sense) when entered a browser’s address box. If only one font in the 
system contained it, it would be imaginable that the font is somehow 
corrupted.


Yucca





Re: Unicode code page and ☃.net

2013-07-30 Thread James Lin
If you open the Windows character Map, "Segoe UI" doesn't contain the
snowman while font Meiryo has.  So it's just probably the font support for
a particular glyph.

thanks
-james

On 7/29/13 9:29 PM, "Jukka K. Korpela"  wrote:

>2013-07-30 4:03, Buck Golemon wrote:
>
>> Also, some browsers have odd support for rendering unicode (non-ascii)
>> urls, for "security" reasons.
>> Both chrome and firefox under Windows 7 render http://www.☃.net/
>>   as http://www.xn--n3h.net/ which is the ascii
>> domain encoding (called punycode or idna) of the snowman unicode
>>character.
>
>That’s something that happens after the user has pressed the Enter key
>to visit the page.
>
>If you just cut and paste the URL into the address box, the SNOWMAN
>character should be visible there, provided that the system has a font
>containing it. Windows 7 ships with Segoe UI Symbol version 5, which
>contains SNOWMAN. And ☃ appears when I cut and past http://www.☃.net/
>into the address box of IE, Firefox, or Chrome.
>
>So I can’t tell why it is empty (no character? a space? an empty
>rectangle?) in some situation. Theoretically at least, it is possible
>that the system contains a faulty font that has a glyph for SNOWMAN but
>the glyph is empty.
>
>What happens when the user presses the Enter key is a story at a
>different protocol level.
>
>Yucca
>
>
>
>





Re: Unicode code page and ☃.net

2013-07-30 Thread Buck Golemon
On Tue, Jul 30, 2013 at 10:45 AM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:

> On Tue, 30 Jul 2013 00:15:58 +0300
> "Jukka K. Korpela"  wrote:
>
> > 2013-07-29 23:42, James Lin wrote:
> >
> > > I have a question regarding the supported Unicode code page.
> >
> > There are no Unicode code pages.
>
> Just to be pedantic, there are several on Windows.  They encode the
> coding form (Unicode codes being best thought of as an assignment of
> natural numbers to characters, with certain approved ways of storing
> those numbers), e.g. Code pages 1200 (little-endian UTF-16), 1201
> (big-endian UTF-16), 12000 (little-endian UTF-32), 12001 (big-endian
> UTF-32), 65000 (UTF-7) and 65001 (UTF-8).
>
> Richard.
>
>
I shudder to imagine the circumstances that forced you to learn this
information.


Re: Unicode code page and ☃.net

2013-07-30 Thread Richard Wordingham
On Tue, 30 Jul 2013 00:15:58 +0300
"Jukka K. Korpela"  wrote:

> 2013-07-29 23:42, James Lin wrote:
> 
> > I have a question regarding the supported Unicode code page.
> 
> There are no Unicode code pages.

Just to be pedantic, there are several on Windows.  They encode the
coding form (Unicode codes being best thought of as an assignment of
natural numbers to characters, with certain approved ways of storing
those numbers), e.g. Code pages 1200 (little-endian UTF-16), 1201
(big-endian UTF-16), 12000 (little-endian UTF-32), 12001 (big-endian
UTF-32), 65000 (UTF-7) and 65001 (UTF-8).

Richard.



Re: Unicode code page and ☃.net

2013-07-29 Thread Jukka K. Korpela

2013-07-30 4:03, Buck Golemon wrote:


Also, some browsers have odd support for rendering unicode (non-ascii)
urls, for "security" reasons.
Both chrome and firefox under Windows 7 render http://www.☃.net/
  as http://www.xn--n3h.net/ which is the ascii
domain encoding (called punycode or idna) of the snowman unicode character.


That’s something that happens after the user has pressed the Enter key 
to visit the page.


If you just cut and paste the URL into the address box, the SNOWMAN 
character should be visible there, provided that the system has a font 
containing it. Windows 7 ships with Segoe UI Symbol version 5, which 
contains SNOWMAN. And ☃ appears when I cut and past http://www.☃.net/ 
into the address box of IE, Firefox, or Chrome.


So I can’t tell why it is empty (no character? a space? an empty 
rectangle?) in some situation. Theoretically at least, it is possible 
that the system contains a faulty font that has a glyph for SNOWMAN but 
the glyph is empty.


What happens when the user presses the Enter key is a story at a 
different protocol level.


Yucca






Re: Unicode code page and ☃.net

2013-07-29 Thread Buck Golemon
James: Most likely you're encountering a font issue, rather than a unicode
issue.
It's entirely possible that the characters are being interpreted correctly,
but the browser is using a font that doesn't support those characters.

Also, some browsers have odd support for rendering unicode (non-ascii)
urls, for "security" reasons.
Both chrome and firefox under Windows 7 render
http://www.☃.net/
as http://www.xn--n3h.net/ which is the ascii domain encoding (called
punycode or idna) of the snowman unicode character.



On Mon, Jul 29, 2013 at 1:42 PM, James Lin  wrote:

> Hi,
> I have a question regarding the supported Unicode code page.  I thought
> once you have unicode code page loaded, all glyph or character should be
> able to map and display correctly regardless of which OS or language you
> are using?
>
> However, i have this snowman: ☃
>
> but once i input www.☃.net into the URL field, the snowman displays empty.
>  Does anyone know if this is result in unicode fonts that isn't used
> correctly?
>
> This is also applicable if i enter the following Chinese character in
> Japanese OS: 橠徇欯幜.NET 
>
> Does anyone know why the empty string is showing?
> thanks
>
>
>
>


Re: Unicode code page and ☃.net

2013-07-29 Thread Stephan Stiller



I have a question regarding the supported Unicode code page.

There are no Unicode code pages.


I guess there is the question of what exactly a codepage is when you 
consider complicated encodings, esp stateful ones. But I always think of 
Unicode as one giant abstract codepage, and Unicode character encoding[ 
scheme]s can be codepages on a system.


If there are subtleties, or if someone has corrections or refinements, 
I'll be all ears.


Stephan




Re: Unicode code page and ☃.net

2013-07-29 Thread Jukka K. Korpela

2013-07-29 23:42, James Lin wrote:


I have a question regarding the supported Unicode code page.


There are no Unicode code pages.


I thought
once you have unicode code page loaded, all glyph or character should be
able to map and display correctly regardless of which OS or language you
are using?


The Unicode Standard does not require support to all characters as a 
matter of conformance, still less correct rendering of them. It requires 
that an implementation must not mess up with characters, to put it 
informally. An implementation that displays “A” as “B” would be faulty; 
but an implementation that is unable to display “A” at all could be 
conforming.



However, i have this snowman: ☃


It has a relatively good font support:
http://www.fileformat.info/info/unicode/char/2603/fontsupport.htm
But it is quite possible that in some context, some font not containing 
it is the one used to render a piece of text, and then a symbol of 
unavailable glyph (like a small rectangle) should appear.



but once i input www.☃.net into the URL field, the snowman displays empty.


It displays OK to me. But different web browsers may have different 
routines for displaying URLs in the address box. These routines may use 
a fixed font that has fairly limited character repertoire. The font may 
depend on the OS, the language you are using (in some sense), or the 
phase of the moon. Generally, we can expect modern web browsers to have 
routines that can handle a large character repertoire in address fields, 
but older browser may have difficulties in this respect.


Yucca