I disagree Claude. The problem he is having is that the source charset is not lining up with the actual data coming across. I think his issue is NCR data.
Java should recognize the correct charset appropriately and in coordination with that, you can use the IsUnmappable() method to make sure you have no garbled text. You would pass the response of getEncoding() to the isUnmappable() as the arguments.charsetA. The is the meat of an isUnmappable() method would be something like this: <cfscript> // get instance of java objects this.jcharset = createObject('java', 'java.nio.charset.Charset'); this.byteBuffer = createObject('java','java.nio.ByteBuffer'); this.charBuffer = createObject('java','java.nio.CharBuffer'); this.codingErrorAction = createObject('java','java.nio.charset.CodingErrorAction'); // format into your function starting here // get needed space of unicode transformation (16 bit Unicode) bLength = len(trim(arguments.field)) * 2; // tell java what unicode code point you are coming from charsetBefore = this.jcharset.forName(arguments.charsetA); // allocate memory for output char buffer outTextCharBuffer = this.charBuffer.allocate(javaCast('int',bLength)); // encode data into byte array inTextByteBuffer = charsetBefore.encode(arguments.field); // tell java what unicode code point you are going to charsetAfter = this.jcharset.forName(arguments.charsetB); // get instance of new decoder decoderForCharsetAfter = charsetAfter.newDecoder(); // raise exception class decoderForCharsetAfter.onUnmappableCharacter(this.codingErrorAction.REPORT); // compare unicode results of both datasets and get boolean response from isUnmappable decoderCoderResult = decoderForCharsetAfter.decode(inTextByteBuffer,outTextCharBuffer,true).isUnM appable().toString(); // return boolean value into struct ret.data.isUnmappable = decoderCoderResult; </cfscript> This would make sure you have no unmappables before you proceed. If you do have NCR data in the text, you can check for that with a regex and then run the return of that through the ncr2unicode to replace those characters (NCRs will be ascii). Here is an ncr2unicode servlet I compiled. http://phillipholmes.com/Java/ncr2unicode.rar You'll need winrar to 'unzip' that. Warmest Regards, Phillip B. Holmes http://phillipholmes.com =======================> -----Original Message----- From: Claude Schneegans [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 17, 2006 1:45 PM To: CF-Talk Subject: Re: CFHTTP Charset Question >>getEncoding() This will "Retrieves the Charset as guessed from the underlying InputStream". But if the charset is not specified in the response header and if CF does not interpret characters correctly, it is probabilly that CF "guesses wrong", so this won't really help. The only way I can see would be to get the page first, in whatever charset, decode the line <?xml version="1.0" encoding="iso-8859-1" ?> and repeat the HTTP request specifying the right charset. -- _______________________________________ REUSE CODE! Use custom tags; See http://www.contentbox.com/claude/customtags/tagstore.cfm (Please send any spam to this address: [EMAIL PROTECTED]) Thanks. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Message: http://www.houseoffusion.com/lists.cfm/link=i:4:240807 Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4 Donations & Support: http://www.houseoffusion.com/tiny.cfm/54