Re: Number of chars

2013-03-21 Thread Roland King

On 21 Mar, 2013, at 5:07 PM, Luca Ciciriello luca_cicirie...@hotmail.com 
wrote:

 An example of string is the italian word più. 
 Here I can visually count 3 chars: p, i and ù. But if I use più.size() the 
 result is 4
 
 std::string ppp = più;
 size_t sss = ppp.size();
 
 here sss is 4.
 
 L.
 

Not totally a cocoa question, it's really C++, if you were asking about the 
original NSString there are lots of cocoa methods to do things. 

The size of a string in C++ is in bytes, so it all rather depends what encoding 
your string is in, which depends on how you're doing the conversion from 
NSString to C++, which I don't think you said. 

My wild guess would be that it's UTF8, since you are getting 1-byte for most 
characters and 2-bytes for a  ù sounds correct for UTF8. What's the actual hex 
of those 4 characters for that string? 0x70 0x69 0xC3 0xB9 is I think the 
encoding for that.  

If it is UTF8 then it's not so hard, just look at the encoding detail for UTF8 
and you have to inspect the top bit or bits of each byte to determine whether 
or not it's a whole character, the start of a multibyte or a continuation of 
one. I would imagine there are C++ functions to do that. Also remember there is 
more than one way to produce a `, one is using the single unicode character for 
it, another is to use a combining diacritic mark and a 'u', that will be 4 
unicode characters and, in UTF8, 5 bytes, for that string. 
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Number of chars

2013-03-21 Thread Luther Baker
I apologize for leading you the wrong way Luca!

-Luther



On Thu, Mar 21, 2013 at 9:46 AM, Luca Ciciriello 
luca_cicirie...@hotmail.com wrote:

 Ok, thanks.

 Luca.

 On Mar 21, 2013, at 3:43 PM, Glenn L. Austin gl...@austin-soft.com
 wrote:

 
  On Mar 21, 2013, at 2:34 AM, Jean-Daniel Dupas devli...@shadowlab.org
 wrote:
 
 
  Le 21 mars 2013 à 09:27, Luca Ciciriello luca_cicirie...@hotmail.com
 a écrit :
 
  Hi all.
  I'm using in my iOS project some Objective-C++ modules. Here I have
 some conversion from NSString to C++11 std::string. After this conversion I
 found (correctly) in my std::string some 2-byte characters.
  My question is: How can I count the number of chars and not the
 numbers of byte in my std::string?
 
 
  Don't use std::string to store unicode string. They are not design to
 support such content.
 
  You can use std::wstring instead.
 
 
  Actually, std::string works *just fine* for UTF-8 strings.
 
  It's just that, in Unicode, 1 character doesn't necessarily fit in 1
 byte.  Also, you can't easily do truncation of strings (you might be
 truncating the string in the middle of a multi-byte sequence -- which is
 true in pretty much every encoding except UCS-4).
 
  UTF-8 is relatively easy to work with, however.  You look at the
 previous byte in the string to see if your current character is part of a
 multi-byte sequence or not -- and keep going back until you find one that
 doesn't have the high-bit set, and that's the last character of the
 previous sequence.  Of course, that go back doesn't mean anything if
 you're already at the first byte in your string...
 
  --
  Glenn L. Austin, Computer Wizard and Race Car Driver 
  http://www.austin-soft.com
 
 


 ___

 Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

 Please do not post admin requests or moderator comments to the list.
 Contact the moderators at cocoa-dev-admins(at)lists.apple.com

 Help/Unsubscribe/Update your Subscription:
 https://lists.apple.com/mailman/options/cocoa-dev/lutherbaker%40gmail.com

 This email sent to lutherba...@gmail.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Number of chars

2013-03-21 Thread Jens Alfke

On Mar 21, 2013, at 11:10 AM, Aki Inoue a...@apple.com wrote:

 Please note that std::string does not provide the localized behavior for 
 collation, searching, case mapping, etc that our customers are accustomed to.
 If you're handling user visible strings, we recommend sticking to NSString at 
 least for those operations.

Or else start digging into the ICU C++ API. 

But yes, it’s definitely best to stick with NSString as much as possible, 
except as necessary to glue into some cross-platform or legacy C++ code.

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Number of chars

2013-03-21 Thread Luca Ciciriello
No problem :-)

Luca

On Mar 21, 2013, at 5:59 PM, Luther Baker lutherba...@gmail.com wrote:

 I apologize for leading you the wrong way Luca!
 
 -Luther
 
 
 
 On Thu, Mar 21, 2013 at 9:46 AM, Luca Ciciriello 
 luca_cicirie...@hotmail.com wrote:
 Ok, thanks.
 
 Luca.
 
 On Mar 21, 2013, at 3:43 PM, Glenn L. Austin gl...@austin-soft.com wrote:
 
 
  On Mar 21, 2013, at 2:34 AM, Jean-Daniel Dupas devli...@shadowlab.org 
  wrote:
 
 
  Le 21 mars 2013 à 09:27, Luca Ciciriello luca_cicirie...@hotmail.com a 
  écrit :
 
  Hi all.
  I'm using in my iOS project some Objective-C++ modules. Here I have some 
  conversion from NSString to C++11 std::string. After this conversion I 
  found (correctly) in my std::string some 2-byte characters.
  My question is: How can I count the number of chars and not the numbers 
  of byte in my std::string?
 
 
  Don't use std::string to store unicode string. They are not design to 
  support such content.
 
  You can use std::wstring instead.
 
 
  Actually, std::string works *just fine* for UTF-8 strings.
 
  It's just that, in Unicode, 1 character doesn't necessarily fit in 1 byte.  
  Also, you can't easily do truncation of strings (you might be truncating 
  the string in the middle of a multi-byte sequence -- which is true in 
  pretty much every encoding except UCS-4).
 
  UTF-8 is relatively easy to work with, however.  You look at the previous 
  byte in the string to see if your current character is part of a multi-byte 
  sequence or not -- and keep going back until you find one that doesn't have 
  the high-bit set, and that's the last character of the previous sequence.  
  Of course, that go back doesn't mean anything if you're already at the 
  first byte in your string...
 
  --
  Glenn L. Austin, Computer Wizard and Race Car Driver 
  http://www.austin-soft.com
 
 
 
 
 ___
 
 Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
 
 Please do not post admin requests or moderator comments to the list.
 Contact the moderators at cocoa-dev-admins(at)lists.apple.com
 
 Help/Unsubscribe/Update your Subscription:
 https://lists.apple.com/mailman/options/cocoa-dev/lutherbaker%40gmail.com
 
 This email sent to lutherba...@gmail.com
 

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: Number of chars

2013-03-21 Thread Aki Inoue

On Mar 21, 2013, at 6:05 PM, Andrew Thompson lordpi...@me.com wrote:

 
 
 On Mar 21, 2013, at 2:10 PM, Aki Inoue a...@apple.com wrote:
 
 For that matter, UTF-32 (aka UCS-4) is not safe to find the truncation 
 boundary just at the 4-byte boundary.
 
 You're thinking of combining marks here?
Yes.

 It's generally claimed that one can multiply character offsets by 4 to index 
 into UCS-4 data… which I think I now see is only true depending on your 
 definition of character; i.e whether one considers a decomposed sequence to 
 be one character or two.

 I see how truncation would be unsafe because you'd chop off the accents etc?
Yes.

Aki

 
 


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com