Thanks for taking a look, Paul.  You're a little over my head with all
of the character codes.  Seems I might have been a little optimistic
in my self-evaluation if what you're talking about is obvious.  :-)

Here's what I know (and it's probably best to treat this as /all/ that I know):

1.  The feed itself specifies its encoding as iso-8859-1 in the source XML.
2.  When I make the get request specifying the charset as iso-8859-1
everything returns fine (at least for this feed)
3.  When I make the get request specifying either no charset at all or
utf-8 the non-ascii characters (what I, perhaps mistakenly, referenced
as members of an "extended character set") are not rendered properly
in the cfhttp.fileContent variable.

Here's what I've been assuming (clearly incorrectly):
1.  Specifying a utf-8 character set would handle most of the
non-ascii characters and allow them to render properly
2.  ColdFusion, if no charset was specified and no content-type was
specified in the response header would default to UTF-8.  This is
straight from the CFHTTP docs if I read and understand them correctly.

You're right about the best practice of stating a charset explicitly
and I think I'll go that direction.  The question then becomes, what
charset can I specify that will handle almost any feed out there?
UTF-8, I had thought, would handle the French alphabet at the very
least.

Thanks again for your help.

On 5/17/06, Paul Hastings <[EMAIL PROTECTED]> wrote:
> Rob Wilkerson wrote:
> > cfhttp.fileContent variable unless the charset is specified as
> > "iso-8859-1".  This is the character set specified within the feed
> > XML, but it's not specified in the response header.  The response
> > charset is an empty string.
>
> well first off that stuff isn't latin-1 (iso-8859-1) it appears to be
> windows-1252 codepage which is more like a super set of latin-1. so there's
> going to mapping issues if you try to treat the feed as latin-1. people often
> confuse the two. next, they seem to have inserted some NCR goop into the 
> content
> (but i'm not sure if that's according to some RSS spec but given other chars 
> in
> the content, i'll guess "no"). btw what do you mean by "extended character 
> set"?
>
> > Even specifying UTF-8 as the charset (I know it's the default, but it
> > was worth trying explicitly) does not return the characters properly.
>
> um it's NOT the default for stuff like cfhttp, cffile, etc. i think cf picks
> that up from the server. you *have* to specify the charset for those tags &
> should any way as good practice.
>
> > 1.  Why doesn't UTF-8 return the characters properly?  I thought that,
> > for most content, UTF-8 would handle the vast majority of characters -
> > certainly the French language's accented "e", etc.
>
> windows-1252 <> latin-1.
>
> > 2.  Do I have any options for returning these characters properly and,
> > if any, what are they?
>
> probably, don't know yet until i look.
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Message: http://www.houseoffusion.com/lists.cfm/link=i:4:240759
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Reply via email to