Off Topic: Unicode

2001-12-13 Thread Jagan K Samuel

Dear All,
  I want to output a particular mathematical operator to a
file. This operator looks like the '=' sign but has one more '-',
underneath the other two. The unicode number seems to be 2261. How can
I
1. show this character as it is using system.out.println()
2. output this to a file.

regards
Jagan

===
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:

 http://archives.java.sun.com/jsp-interest.html
 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.jsp
 http://www.jguru.com/faq/index.jsp
 http://www.jspinsider.com



Re: Off Topic: Unicode

2001-12-13 Thread Mike Akerman

On Thu, 13 Dec 2001, Jagan K Samuel wrote:

> Dear All,
>   I want to output a particular mathematical operator to a
> file. This operator looks like the '=' sign but has one more '-',
> underneath the other two. The unicode number seems to be 2261. How can
> I
> 1. show this character as it is using system.out.println()
> 2. output this to a file.
>
> regards
> Jagan
>

"System.out.println("\u2261");"  Should work.  Incidentally, your question
got me wondering if I could do Unicode from servlets, and view the results
in normal browsers.  I have a test servlet for Unicode now.  You have to
install the language packs for Greek, Hebrew, Japanese etc.  This is
easily done from Internet Explorer->View->Encoding->More->Hebrew.  It will
then prompt for the Windows 2000 disc and install the language fonts.
After the language packs are installed through Internet Explorer, the
servlet works in Opera 6 and Netscape 6 as well.

Clearly this could be translated to JSP easily.

Incidentally, you can do one extra language at a time if you set the
charset to ISO-8859-1 or ISO-8859-7, etc depending of course on which
language you want, and if the language fonts are on your computer.

Michael Akerman



import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class UnicodeServlet extends HttpServlet
{
public void doGet (HttpServletRequest req, HttpServletResponse res) throws 
ServletException, IOException
{
res.setContentType("text/html; charset=UTF-8");
PrintWriter out = res.getWriter();

out.println("");
out.println("");
out.println("");
out.println("");
for(int i=0x0030; i<=0x00ff; i++)
{
if ( i % 16 == 0 ) out.println("");
out.print(""); out.write(i);
}
for(int i=0x0370; i<=0x03ff; i++)
{
if ( i % 16 == 0 ) out.println("");
out.print(""); out.write(i);
}
for(int i=0x3040; i<=0x30ff; i++)
{
if ( i % 16 == 0 ) out.println("");
out.print(""); out.write(i);
}
for(int i=0x0590; i<=0x05ff; i++)
{
if ( i % 16 == 0 ) out.println("");
out.print(""); out.write(i);
}
out.println("");
out.println("");
out.println("");
}
}

===
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:

 http://archives.java.sun.com/jsp-interest.html
 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.jsp
 http://www.jguru.com/faq/index.jsp
 http://www.jspinsider.com



Re: Off Topic: Unicode

2001-12-14 Thread Anthony Tagunov

Hello Mike!

MA> "System.out.println("\u2261");"
Hmmm.. Looks the System.out writer is set up to smth like ISO-8859-1,
so will this work?
MA> Incidentally, your question
MA> got me wondering if I could do Unicode from servlets, and view the results
MA> in normal browsers.
Again the 'out' writer is set up to the charset you declared in the
<%@ page%>, so probably this char won't get though and turn into
a ? unless you have contentType="text/html;charset=utf-8"

Maybe doing ࣕ would be better in a general setting?

Best regards, Anton

===
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:

 http://archives.java.sun.com/jsp-interest.html
 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.jsp
 http://www.jguru.com/faq/index.jsp
 http://www.jspinsider.com



Re: Off Topic: Unicode

2001-12-17 Thread Mike Akerman

> MA> "System.out.println("\u2261");"
>
> Hmmm.. Looks the System.out writer is set up to smth like ISO-8859-1,
> so will this work?

"System.out" is a PrintStream so it depends on the default encoding. From
the JavaDocs:

All characters printed by a PrintStream are converted into bytes using the
platform's default character encoding. The PrintWriter class should be
used in situations that require writing characters rather than bytes.

http://java.sun.com/j2se/1.4/docs/api/index.html

If your platform default encoding is ISO-8859-1, than indeed
"System.out.println("\u2261");" isn't going to output, as its not in the
Unicode to ISO mappings -- http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT

>> Maybe doing ࣕ would be better in a general setting?

If you do this through a "System.out.write(ࣕ)", it might be
interpretted as a byte instead of a character, which might actually make
it though the filters, though I think it would be best to try to tweak the
platform default encoding to UTF-8 instead.  If anyone has any emperical
evidence one way or another let me know.

Michael Akerman

===
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:

 http://archives.java.sun.com/jsp-interest.html
 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.jsp
 http://www.jguru.com/faq/index.jsp
 http://www.jspinsider.com



Re: Off Topic: Unicode

2001-12-18 Thread Anthony Tagunov

Hello Mike and everybody!

>> MA> "System.out.println("\u2261");"
>>
>> Hmmm.. Looks the System.out writer is set up to smth like ISO-8859-1,
>> so will this work?

MA> "System.out" is a PrintStream so it depends on the default encoding.
MA> ...
MA> If your platform default encoding is ISO-8859-1, than indeed
MA> "System.out.println("\u2261");" isn't going to output, as its not in the
MA> Unicode to ISO mappings -- 
http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
Thanks for clearing this out!

>>> Maybe doing ࣕ would be better in a general setting?
Well, i was speaking about doing out.write("ࣕ") in a _servlet_.
I beleive that if we do it, we'll get the ࣕ sequence output
directly to the html page and the browser may recognize this as
a HTML encoding of a unicode character. This is a reasonable
option even if we have ISO-8859-1 as our page encoding.

MA> If you do this through a "System.out.write(ࣕ)", it might be
MA> interpretted as a byte instead of a character...
Well, i beleive that if we really do 'System.out.write("ࣕ");'
we'll get "ࣕ" on our system console, but this quite useless,
isn't it? Sure it won't "be interpreted as a byte".

Best regards, Anton

===
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:

 http://archives.java.sun.com/jsp-interest.html
 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.jsp
 http://www.jguru.com/faq/index.jsp
 http://www.jspinsider.com



Re: Off Topic: Unicode

2001-12-18 Thread Mike Akerman

> >>> Maybe doing ࣕ would be better in a general setting?

> Well, i was speaking about doing out.write("ࣕ") in a _servlet_.
> I beleive that if we do it, we'll get the ࣕ sequence output
> directly to the html page and the browser may recognize this as
> a HTML encoding of a unicode character. This is a reasonable
> option even if we have ISO-8859-1 as our page encoding.
>
> MA> If you do this through a "System.out.write(ࣕ)", it might be
> MA> interpretted as a byte instead of a character...
>
> Well, i beleive that if we really do 'System.out.write("ࣕ");'
> we'll get "ࣕ" on our system console, but this quite useless,
> isn't it? Sure it won't "be interpreted as a byte".
>
> Best regards, Anton

Well I had temporarily confused ࣕ with legal java hex, 0x2261.  I
meant to say "System.out.write(0x2261)"

Bringing up the "�" representation is a good idea, but html character
entities should be decimal numbers and "\u2261"  converted from hex to
decimal is 8801.

Originally, I highly doubted that something as high as 8801 was doable as
an HTML character entity.  According to "Webmaster in a Nutshell", these
HTML character entities must be ISO-8859-1 characters, and not even the
full 256 ISO-8859-1 character set is supported.

However I checked anyway and "≡" works great --just to show how
accurate a book from June 99 is.  I ran:

for(int i=0x2200; i<=0x22ff; i++)
{
if ( i % 16 == 0 ) out.println("");
out.print("&#"+i+";");
}

To print the entire "Unicode Mathematical Operators" set.  About 1/4
displayed in Internet Explorer 6.0, and the full set displayed in Netscape
6.1 and Opera 6.0.

So it looks like 'System.out.println("&8801;");' is the solution as you
said.

Michael Akerman

===
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:

 http://archives.java.sun.com/jsp-interest.html
 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.jsp
 http://www.jguru.com/faq/index.jsp
 http://www.jspinsider.com



Re: Off Topic: Unicode

2001-12-18 Thread Anthony Tagunov

Hello Mike and everybody!

>> >>> Maybe doing ࣕ would be better in a general setting?

>> Well, i was speaking about doing out.write("ࣕ") in a _servlet_.
> Bringing up the "�" representation is a good idea, but html character
> entities should be decimal numbers and "\u2261"  converted from hex to
> decimal is 8801.
Thanks! And we also could do "≡" to avoid conversion to
decimal. This way we enlarge our html page by 1 byte, of course.

MA> Originally, I highly doubted that something as high as 8801 was doable as
MA> an HTML character entity.  According to "Webmaster in a Nutshell", these
MA> HTML character entities must be ISO-8859-1 characters, and not even the
MA> full 256 ISO-8859-1 character set is supported.
This is an abstract from the HTML 4.0 spec, http://www.w3.org/TR/REC-html40

'3.2.3 Character references

Character references are numeric or symbolic names for characters that
may be included in an HTML document. They are useful for referring to
rarely used characters, or those that authoring tools make it difficult
or impossible to enter. You will see character references throughout
this document; they begin with a "&" sign and end with a semi-colon (;).
Some common examples include:

 "<" represents the < sign.
 ">" represents the > sign.
 """ represents the " mark.
 "å" (in decimal) represents the letter "a" with a small circle above it.
 "И" (in decimal) represents the Cyrillic capital letter "I".
 "水" (in hexadecimal) represents the Chinese character for water.'

So these look pretty much like Unicode character codes.
This way we embed, say cyrillics into ISO-8859-1 coded pages, but it
is quite wastefull as instead of two bytes per char as we would need
with utf-8 or 1 byte per char as with windows-1251 we need at least 7.

But if this are just some chars in the doc it is quite affordable!

MA> However I checked anyway and "≡" works great --just to show how
MA> accurate a book from June 99 is.  I ran:
MA> for(int i=0x2200; i<=0x22ff; i++)
MA> {
MA> if ( i % 16 == 0 ) out.println("");
MA> out.print("&#"+i+";");
MA> }
MA> To print the entire "Unicode Mathematical Operators" set.  About 1/4
MA> displayed in Internet Explorer 6.0, and the full set displayed in Netscape
MA> 6.1 and Opera 6.0.
Thanks for investigating this, it's an interesting result!

MA> So it looks like 'System.out.println("&8801;");' is the solution as you
MA> said.

Why System.out ;-) ?
Sure it is just 'out' if we're in a jsp or whatever you have named it
if it is your own servlet!
So, 'out.write("≡");' is the solution!
(Please excuse me for being over-pedantic!)
Best regards,
 Antonmailto:[EMAIL PROTECTED]

===
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:

 http://archives.java.sun.com/jsp-interest.html
 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.jsp
 http://www.jguru.com/faq/index.jsp
 http://www.jspinsider.com



Re: Off Topic: Unicode

2001-12-18 Thread Mike Akerman

> This is an abstract from the HTML 4.0 spec, http://www.w3.org/TR/REC-html40

Thanks, I had been using an Orielly book for my research and there was
much omitted.  The HTML spec is definitely a better source of info.

> MA> So it looks like 'System.out.println("&8801;");' is the solution as you
> MA> said.
>
> Why System.out ;-) ?

Because, if you follow the thread back to the beginning that was what the
original question was about.  This was also why I was thrown for a loop
when you used "ࣕ" as I wasn't expecting HTML/JSP output.  He was
wanting to "System.out" and redirect it to a file if I recall.

Michael Akerman

===
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:

 http://archives.java.sun.com/jsp-interest.html
 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.jsp
 http://www.jguru.com/faq/index.jsp
 http://www.jspinsider.com