There's actually just one simple question, but there's a bit of background for context:
--


In my app, I import data from potentially very large files. In the first pass, I simply mmap'd the entire file, created a string using CFStringCreateWithBytesNoCopy, and go about my business. This works great until it hits the address limit when it's running as a 32-bit process, so now in the second pass I want to rework it a bit to only mmap a chunk (128 MB) at a time.

Now, if it were simply binary data, I could chop up the file however I wanted, but since the file I'm processing is actually a huge *text* file, I need to mmap an appropriate range so creating the string doesn't fail because a multi-byte character was split down the middle.

Obviously if the file's encoding is always single bytes (IOW, CFStringGetMaximumSizeForEncoding(1, encoding) returns 1), then I can just use any range I want. If it's UTF8 or 16 I can check the high bits to figure out the right range before creating a string. But I don't know anything about pretty much any other encoding, so I don't know which ones are fixed-widths and which ones are variable like UTF8.

If I can identify an encoding as fixed width and know what the width is, it's really easy to handle the range by simply using a multiple of whatever the width is. And more or less, I'd expect that all of the other variable-width encodings would needs some special handling like UTF8.

So, I generally know what I should do, but the problem is that I don't know how to identify an encoding as fixed-width or variable. I could spend the time to look up each and every encoding on the internet, but there are kind of a lot of them :) And then my code wouldn't be future- proof if an encoding is added.


Can anyone offer some insight into how I could dynamically determine an encoding's characteristics? Or maybe I should just hard code it/do it by hand because there are really very few cases to handle.


Thanks,

--
Seth Willits



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to