Thanks for taking a look, Paul. You're a little over my head with all of the character codes. Seems I might have been a little optimistic in my self-evaluation if what you're talking about is obvious. :-)
Here's what I know (and it's probably best to treat this as /all/ that I know): 1. The feed itself specifies its encoding as iso-8859-1 in the source XML. 2. When I make the get request specifying the charset as iso-8859-1 everything returns fine (at least for this feed) 3. When I make the get request specifying either no charset at all or utf-8 the non-ascii characters (what I, perhaps mistakenly, referenced as members of an "extended character set") are not rendered properly in the cfhttp.fileContent variable. Here's what I've been assuming (clearly incorrectly): 1. Specifying a utf-8 character set would handle most of the non-ascii characters and allow them to render properly 2. ColdFusion, if no charset was specified and no content-type was specified in the response header would default to UTF-8. This is straight from the CFHTTP docs if I read and understand them correctly. You're right about the best practice of stating a charset explicitly and I think I'll go that direction. The question then becomes, what charset can I specify that will handle almost any feed out there? UTF-8, I had thought, would handle the French alphabet at the very least. Thanks again for your help. On 5/17/06, Paul Hastings <[EMAIL PROTECTED]> wrote: > Rob Wilkerson wrote: > > cfhttp.fileContent variable unless the charset is specified as > > "iso-8859-1". This is the character set specified within the feed > > XML, but it's not specified in the response header. The response > > charset is an empty string. > > well first off that stuff isn't latin-1 (iso-8859-1) it appears to be > windows-1252 codepage which is more like a super set of latin-1. so there's > going to mapping issues if you try to treat the feed as latin-1. people often > confuse the two. next, they seem to have inserted some NCR goop into the > content > (but i'm not sure if that's according to some RSS spec but given other chars > in > the content, i'll guess "no"). btw what do you mean by "extended character > set"? > > > Even specifying UTF-8 as the charset (I know it's the default, but it > > was worth trying explicitly) does not return the characters properly. > > um it's NOT the default for stuff like cfhttp, cffile, etc. i think cf picks > that up from the server. you *have* to specify the charset for those tags & > should any way as good practice. > > > 1. Why doesn't UTF-8 return the characters properly? I thought that, > > for most content, UTF-8 would handle the vast majority of characters - > > certainly the French language's accented "e", etc. > > windows-1252 <> latin-1. > > > 2. Do I have any options for returning these characters properly and, > > if any, what are they? > > probably, don't know yet until i look. > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Message: http://www.houseoffusion.com/lists.cfm/link=i:4:240759 Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4 Donations & Support: http://www.houseoffusion.com/tiny.cfm/54