No, that's not the same thing. The method you suggest assumes an exact encoding; the sniffer functions from TextEncodingConverter look at the data to see if it follows the patterns appropriate for a suggested set of encodings and lets you know which one would be the best match. Typically, such sniffers are best for differentiating DBCS-based characters where there's a sequence like you'd find in Shift-JIS and the like. Let me know when you find the "Cocoa" way to do this.
>More modern and more Cocoa way? You mean something like this + >[NSString stringWithContentsOfFile:usedEncoding:error:] ;-) > >«Discussion > This method attempts to determine the encoding of the file at path.» > >Le 7 mai 08 à 19:33, Gary L. Wade a écrit : > >> If you're interested in determining the best encoding match for >> text, look at the TextEncodingConverter.h header, which has >> functions related to encoding sniffing. There may be more modern >> techniques available, but I had used that almost a decade ago in a >> formerly major web browser. It's not perfect, of course, but it >> might be the best solution for your problem. >> >>> >>> On May 6, 2008, at 9:22 PM, Jens Alfke wrote: >>> >>>> >>>> On 6 May '08, at 10:45 AM, Aki Inoue wrote: >>>> >>>>> Actually, I don't recommend using CP1252 as the generic fallback >>>>> encoding like this. >>>>> The encoding does have gaps, and the handling of those invalid gaps >>>>> varies between conversion engines. CF/NSString treat the invalid >>>>> bytes strictly and return nil encountering those. >>>> >>>> I wasn't aware it had gaps — I've never run into them. Where are >>>> they? >>> >>> <http://en.wikipedia.org/wiki/Windows-1252> >>> >>> 5 characters in the 0x80..0x9F range. >>> >>>>> So, our recommendation now is to try UTF-8 first; then, try some >>>>> other encoding deduced from the context (user's localization, >>>>> intended source/destination of the data, etc). If all failed, >>>>> should try MacRoman as the ultimate fallback (the encoding has no >>>>> gap so never fails). >>>> >>>> In the contexts I've been dealing with — data fetched over HTTP from >>>> random websites — there hasn't been anything deducible from the >>>> context (assuming the HTTP Content-Type already failed.) In that >>>> situation MacRoman is not at all a good fallback as almost no Web >>>> content uses it; CP-1252 or ISO-Latin-1 are the most likely >>>> fallbacks after UTF-8. >>> >>> >>> I will agree with this if it's web content you're dealing with. >>> Although, just do a fallback to windows1252. Lots of site content >>> was >>> authored with that encoding and mistakenly marked as ISO_8859-1. But >>> that's a topic for another forum. >>> >> _______________________________________________ >> >> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) >> >> Please do not post admin requests or moderator comments to the list. >> Contact the moderators at cocoa-dev-admins(at)lists.apple.com >> >> Help/Unsubscribe/Update your Subscription: >> http://lists.apple.com/mailman/options/cocoa-dev/devlists%40shadowlab.org >> >> This email sent to [EMAIL PROTECTED] >> > _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]