On Aug 18, 2011, at 1:02 PM, Wilker wrote:

> But I don't know which encoding the string is using... And when it has some
> latin or other kind of characteres, the return is "nil".
> In my case I really don't care about these characters, if I can just remove
> non-ascii from C String and them convert to NSString will be fine for me.

If you can find out what the actual encoding is, it’s best to use it. But I 
know there are cases where you don’t know, or the incorrect encoding is used 
(hello, RSS feeds…)

What I’ve done as a heuristic is to first try UTF-8, as you’re doing; and if 
that fails (returns nil) fall back to CP-1252 (NSWindowsCP1252StringEncoding). 
The rationale is that (a) this is the default encoding used by MS Windows, at 
least in Western countries; (b) it’s a superset of the highly-standard 
ISO-8859-1 (aka ISO-Latin-1) which is a superset of ASCII; and (c) it defines 
all 256 code points.

So the result is that the conversion to NSString will never fail; ASCII 
characters come out correctly; and non-ASCII Latin characters will quite often 
come out OK.

Unfortunately this approach doesn’t work at all if the source data is not in an 
8-bit encoding — like if it’s UTF-16 or one of the old multi-byte Asian 
encodings. I’m sure there are heuristics to use there, but I don’t know them.

If you’re reading the string from a file or HTTP URL, you can also use 
-[NSString initWithContentsOfFile/URL:usedEncoding:error:], which will attempt 
to figure out the correct encoding using clues like the filename extension, 
leading BOM bytes, etc.

—Jens_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to