Re: utf-8, strlen(), and virtcol()

2007-09-20 Fir de Conversatie Charles E Campbell Jr

Tony Mechelynck wrote:


It all depends on what exactly you want to do. (I haven't read the Align.vim 
docs.) The length of a UTF-8 string can be counted in several nonequivalent 
ways:

- number of bytes (Latin a + combining circumflex is three bytes):
   strlen(string)

- number of codepoints (Latin a + combining circumflex is two codepoints):
   strlen(substitute(string, '.', 'x', 'g'))

- number of spacing codepoints (Latin a + combining circumflex is one spacing 
codepoint; a hard tab is one; wide and narrow CJK are one each; etc.): 
(untested)
   strlen(substitute(string, '.\Z', 'x', 'g'))

- virtual length (counting, for instance, tabs as anything between 1 and 
'tabstop', wide CJK as 2 rather than 1, Arabic alif as zero when immediately 
preceded by lam, one otherwise, etc.): I guess something like what you're 
doing above will be necessary because of the wide range of things that can 
happen.

The first two above are documented at :help strlen(), the third (in 
addition) at :help patterns-composing.
  

Thank you,  Tony, for that explanation!  I've modified Align so that the 
method used is selectable by the user.  Align v33d available at my 
website (http://mysite.verizon.net/astronaut/vim/index.html#ALIGN) with 
these changes.

Regards,
Chip Campbell

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: utf-8, strlen(), and virtcol()

2007-09-20 Fir de Conversatie Tony Mechelynck

Charles E Campbell Jr wrote:
 Tony Mechelynck wrote:
 
 It all depends on what exactly you want to do. (I haven't read the Align.vim 
 docs.) The length of a UTF-8 string can be counted in several nonequivalent 
 ways:

 - number of bytes (Latin a + combining circumflex is three bytes):
  strlen(string)

 - number of codepoints (Latin a + combining circumflex is two codepoints):
  strlen(substitute(string, '.', 'x', 'g'))

 - number of spacing codepoints (Latin a + combining circumflex is one 
 spacing 
 codepoint; a hard tab is one; wide and narrow CJK are one each; etc.): 
 (untested)
  strlen(substitute(string, '.\Z', 'x', 'g'))

 - virtual length (counting, for instance, tabs as anything between 1 and 
 'tabstop', wide CJK as 2 rather than 1, Arabic alif as zero when immediately 
 preceded by lam, one otherwise, etc.): I guess something like what you're 
 doing above will be necessary because of the wide range of things that can 
 happen.

 The first two above are documented at :help strlen(), the third (in 
 addition) at :help patterns-composing.
  

 Thank you,  Tony, for that explanation!  I've modified Align so that the 
 method used is selectable by the user.  Align v33d available at my 
 website (http://mysite.verizon.net/astronaut/vim/index.html#ALIGN) with 
 these changes.
 
 Regards,
 Chip Campbell

... and, in addition, when 'fileencoding' is nonempty and different from 
'encoding', the number of disk bytes used might be useful, but I don't know 
how Vim could get it, especially for encodings such as those used in Eastern 
Asia, where the number of bytes per character may vary in a way which is often 
not easily predictable from the UTF-8 representation. (The 2-or-4-bytes of 
UTF-16 is peanuts next to that, but Vim cannot use UTF-16 for its internal 
representation of the data because of the intervening nulls.)


Best regards,
Tony.
-- 
Weiler's Law:
Nothing is impossible for the man who doesn't have to do it
himself.

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: utf-8, strlen(), and virtcol()

2007-09-19 Fir de Conversatie Andy Wokula

Charles E Campbell Jr schrieb:
 Nikolai Weibull wrote:
 
 On 9/19/07, Charles E Campbell Jr [EMAIL PROTECTED] wrote:

 let x='grĂ¼n'
 echo strlen(x)=.strlen(x)

 Thus, strlen() returns 5, not 4 as one might (sometimes) expect.


 Here's what I have in one my base library:

 function now#mbc#len(str)
  return strlen(substitute(a:str, '.', 'c', 'g'))
 endfunction

 Which is incredibly much better than your solution ;-).
  

 Well, I came up with another solution, but it still isn't as good as 
 yours!  Shouldn't strlen() just handle this on its own?  With C or C++, 
 one may be wanting to use the output of strlen() to help with allocating 
 memory to hold a string; I don't see any of that application with Vim.
 
 Regards,
 Chip Campbell

The multibyte strlen() is even suggested/documented here:
:h strlen()

-- 
Andy

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---