Re: Number of chars
On 21 Mar, 2013, at 5:07 PM, Luca Ciciriello luca_cicirie...@hotmail.com wrote: An example of string is the italian word più. Here I can visually count 3 chars: p, i and ù. But if I use più.size() the result is 4 std::string ppp = più; size_t sss = ppp.size(); here sss is 4. L. Not totally a cocoa question, it's really C++, if you were asking about the original NSString there are lots of cocoa methods to do things. The size of a string in C++ is in bytes, so it all rather depends what encoding your string is in, which depends on how you're doing the conversion from NSString to C++, which I don't think you said. My wild guess would be that it's UTF8, since you are getting 1-byte for most characters and 2-bytes for a ù sounds correct for UTF8. What's the actual hex of those 4 characters for that string? 0x70 0x69 0xC3 0xB9 is I think the encoding for that. If it is UTF8 then it's not so hard, just look at the encoding detail for UTF8 and you have to inspect the top bit or bits of each byte to determine whether or not it's a whole character, the start of a multibyte or a continuation of one. I would imagine there are C++ functions to do that. Also remember there is more than one way to produce a `, one is using the single unicode character for it, another is to use a combining diacritic mark and a 'u', that will be 4 unicode characters and, in UTF8, 5 bytes, for that string. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Number of chars
I apologize for leading you the wrong way Luca! -Luther On Thu, Mar 21, 2013 at 9:46 AM, Luca Ciciriello luca_cicirie...@hotmail.com wrote: Ok, thanks. Luca. On Mar 21, 2013, at 3:43 PM, Glenn L. Austin gl...@austin-soft.com wrote: On Mar 21, 2013, at 2:34 AM, Jean-Daniel Dupas devli...@shadowlab.org wrote: Le 21 mars 2013 à 09:27, Luca Ciciriello luca_cicirie...@hotmail.com a écrit : Hi all. I'm using in my iOS project some Objective-C++ modules. Here I have some conversion from NSString to C++11 std::string. After this conversion I found (correctly) in my std::string some 2-byte characters. My question is: How can I count the number of chars and not the numbers of byte in my std::string? Don't use std::string to store unicode string. They are not design to support such content. You can use std::wstring instead. Actually, std::string works *just fine* for UTF-8 strings. It's just that, in Unicode, 1 character doesn't necessarily fit in 1 byte. Also, you can't easily do truncation of strings (you might be truncating the string in the middle of a multi-byte sequence -- which is true in pretty much every encoding except UCS-4). UTF-8 is relatively easy to work with, however. You look at the previous byte in the string to see if your current character is part of a multi-byte sequence or not -- and keep going back until you find one that doesn't have the high-bit set, and that's the last character of the previous sequence. Of course, that go back doesn't mean anything if you're already at the first byte in your string... -- Glenn L. Austin, Computer Wizard and Race Car Driver http://www.austin-soft.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/lutherbaker%40gmail.com This email sent to lutherba...@gmail.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Number of chars
On Mar 21, 2013, at 11:10 AM, Aki Inoue a...@apple.com wrote: Please note that std::string does not provide the localized behavior for collation, searching, case mapping, etc that our customers are accustomed to. If you're handling user visible strings, we recommend sticking to NSString at least for those operations. Or else start digging into the ICU C++ API. But yes, it’s definitely best to stick with NSString as much as possible, except as necessary to glue into some cross-platform or legacy C++ code. —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Number of chars
No problem :-) Luca On Mar 21, 2013, at 5:59 PM, Luther Baker lutherba...@gmail.com wrote: I apologize for leading you the wrong way Luca! -Luther On Thu, Mar 21, 2013 at 9:46 AM, Luca Ciciriello luca_cicirie...@hotmail.com wrote: Ok, thanks. Luca. On Mar 21, 2013, at 3:43 PM, Glenn L. Austin gl...@austin-soft.com wrote: On Mar 21, 2013, at 2:34 AM, Jean-Daniel Dupas devli...@shadowlab.org wrote: Le 21 mars 2013 à 09:27, Luca Ciciriello luca_cicirie...@hotmail.com a écrit : Hi all. I'm using in my iOS project some Objective-C++ modules. Here I have some conversion from NSString to C++11 std::string. After this conversion I found (correctly) in my std::string some 2-byte characters. My question is: How can I count the number of chars and not the numbers of byte in my std::string? Don't use std::string to store unicode string. They are not design to support such content. You can use std::wstring instead. Actually, std::string works *just fine* for UTF-8 strings. It's just that, in Unicode, 1 character doesn't necessarily fit in 1 byte. Also, you can't easily do truncation of strings (you might be truncating the string in the middle of a multi-byte sequence -- which is true in pretty much every encoding except UCS-4). UTF-8 is relatively easy to work with, however. You look at the previous byte in the string to see if your current character is part of a multi-byte sequence or not -- and keep going back until you find one that doesn't have the high-bit set, and that's the last character of the previous sequence. Of course, that go back doesn't mean anything if you're already at the first byte in your string... -- Glenn L. Austin, Computer Wizard and Race Car Driver http://www.austin-soft.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/lutherbaker%40gmail.com This email sent to lutherba...@gmail.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: Number of chars
On Mar 21, 2013, at 6:05 PM, Andrew Thompson lordpi...@me.com wrote: On Mar 21, 2013, at 2:10 PM, Aki Inoue a...@apple.com wrote: For that matter, UTF-32 (aka UCS-4) is not safe to find the truncation boundary just at the 4-byte boundary. You're thinking of combining marks here? Yes. It's generally claimed that one can multiply character offsets by 4 to index into UCS-4 data… which I think I now see is only true depending on your definition of character; i.e whether one considers a decomposed sequence to be one character or two. I see how truncation would be unsafe because you'd chop off the accents etc? Yes. Aki ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com