Re: Unicode character transformation through XSLT

2003-03-14 Thread Markus Scherer
Nooo - Java's old "UTF" functions do not process UTF-8! They are there for String serialization, a Java-internal format. Use the Java Reader/Writer classes instead of these old ones! See the Java tutorials on Internationalization: http://java.sun.com/docs/books/tutorial/i18n/text/convertintro.htm

Re: Unicode character transformation through XSLT

2003-03-13 Thread Yung-Fong Tang
rom: [EMAIL PROTECTED][mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 11, 2003 6:09PM To: Jain, Pankaj (MED, TCS) Cc: '[EMAIL PROTECTED]';'[EMAIL PROTECTED]' Subject: Re: Unicode character transformationthrough XSLT Because the foll

Re: Unicode character transformation through XSLT

2003-03-13 Thread Pim Blokland
Jain, Pankaj (MED, TCS) schreef: > I modified my program as per your suggestion(modified to byChunk&127) , Sorry, I was much too hasty with my reply. First of all, I should have written byChunk&255. And secondly, solutions like the one Markus proposes are much better thought out. My apologies. P

Re: Unicode character transformation through XSLT

2003-03-12 Thread Markus Scherer
Generally, try instantiating an InputStreamReader or similar from your input, with an explicit encoding="UTF8". That will perform the conversion from UTF-8 to the internal 16-bit Unicode that Java processes. Always use XYZReader classes for text input and XYZWriter classes for text output. java

RE: Unicode character transformation through XSLT

2003-03-12 Thread Jain, Pankaj (MED, TCS)
bject: Re: Unicode character transformation through XSLT Jain, Pankaj (MED, TCS) schreef: > while((chunk = ipStream.read())!=-1) > { > byte byChunk = new Integer(chunk).byteValue(); > strBuf.append((char) byChunk); > } You don't say which type your "chunk" variable is, but

Re: Unicode character transformation through XSLT

2003-03-12 Thread John Cowan
Pim Blokland scripsit: > As I understand it, char is a signed 16 bits type in Java; any of > the others may be unsigned. Hence the problem. Char is *unsigned*, all the others are always signed. -- "May the hair on your toes never fall out!" John Cowan --Thorin Oakenshield (to Bilbo

Re: Unicode character transformation through XSLT

2003-03-12 Thread Pim Blokland
Jain, Pankaj (MED, TCS) schreef: > while((chunk = ipStream.read())!=-1) > { > byte byChunk = new Integer(chunk).byteValue(); > strBuf.append((char) byChunk); > } You don't say which type your "chunk" variable is, but the problem is definitely in the number of conversions you do. In this tiny piec

RE: Unicode character transformation through XSLT

2003-03-12 Thread Jain, Pankaj (MED, TCS)
ou have any information on this, than pls let me know. Thanks -Pankaj -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Monday, March 10, 2003 7:59 PM To: Jain, Pankaj (MED, TCS) Cc: '[EMAIL PROTECTED]' Subject: Re: Unicode character transformation thro

Re: Unicode character transformation through XSLT

2003-03-11 Thread Yung-Fong Tang
--Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Monday, March 10, 2003 7:59 PM To: Jain, Pankaj (MED, TCS) Cc: '[EMAIL PROTECTED]' Subject: Re: Unicode character transformation through XSLT . Pankaj Jain wrote, My problem is that, I am getting Unic

Re: Unicode character transformation through XSLT

2003-03-11 Thread Pim Blokland
Jain, Pankaj (MED, TCS) schreef: > But still I have a doubt that why \uFFE2\uFF80\uFF93 is giving ndash in > html. In html? No way! Html can't interpret series of hex bytes. Try – or –. Pim Blokland

RE: Unicode character transformation through XSLT

2003-03-11 Thread Jain, Pankaj (MED, TCS)
7:59 PM To: Jain, Pankaj (MED, TCS) Cc: '[EMAIL PROTECTED]' Subject: Re: Unicode character transformation through XSLT . Pankaj Jain wrote, > My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93) > from resource bundle property file which is equivalent to ndash(

Re: Unicode character transformation through XSLT

2003-03-11 Thread Markus Scherer
Kenneth Whistler wrote: "Unicode character (\uFFE2\uFF80\uFF93)" > ... What you are actually looking for is the UTF-8 sequence: 0xE2 0x80 0x93 The 8-bit UTF-8 bytes E2 80 93 (all with the most significant bit set) get *sign-extended* to 16 bits, producing FFE2 FF80 FF93. It should suffice in a UT

Re: Unicode character transformation through XSLT

2003-03-10 Thread Kenneth Whistler
Well, I can't diagnose exactly what is going wrong, but "Unicode character (\uFFE2\uFF80\uFF93)" is a sequence of a full-width not sign, followed by a half-width katakana ta and a half-width katakana mo. What you are actually looking for is the UTF-8 sequence: 0xE2 0x80 0x93 which is the UTF-8

Re: Unicode character transformation through XSLT

2003-03-10 Thread jameskass
. Pankaj Jain wrote, > My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93) > from resource bundle property file which is equivalent to ndash(-) and > its U+2013 is the ndash (–). It is represented in UTF-8 by three hex bytes: E2 80 93. But, \uFFE2 is fullwidth pound sign \uF

Unicode character transformation through XSLT

2003-03-10 Thread Jain, Pankaj (MED, TCS)
Hi My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93) from resource bundle property file which is equivalent to ndash(-) and its works fine in html and XML but while Transformation through XSLT, it unable to interpret it. and hence in I am getting ???in stead of ndash.