On Feb 2, 2009, at 7:50 PM, Joar Wingfors wrote:

Before opening the file, either determine, guess, or be told what the encoding is. With that encoding, convert your delimiter string into raw bytes, then do byte-for-byte comparison on the file to find occurrences of that delimiter.

How do you know what delimiter string to use?

Well the original poster said he wants to read lines. So \r, \n, or \r \n is your delimiter. It depends on your usage. If you're reading a binary file, then the combination of encoding/delimiter isn't an issue since you're going to have fixed data chunk sizes or a length value stored in the file itself telling how how big a chunk is. But if you're reading a text file, line endings are more or less the only logical delimiter.


If you have an encoding where characters are not of fixed width, is it generally safe to assume that the byte signature of the valid delimiter strings for that encoding cannot also be found as a sub pattern of some combination of other characters? Perhaps that would always be a safe assumption, I'm no expert on string encodings and line delimiters.

I actually thought about that as well, and honestly I'm not 100% sure. I can see it being plausible or very likely to be problem and I'm not sure which is correct. Oops. But it should be universally easy to deal with by checking the MSB on the preceding byte if it's not fixed- width? (I think)


--
Seth Willits



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to