Re: utf-8, strlen(), and virtcol()
Tony Mechelynck wrote: It all depends on what exactly you want to do. (I haven't read the Align.vim docs.) The length of a UTF-8 string can be counted in several nonequivalent ways: - number of bytes (Latin a + combining circumflex is three bytes): strlen(string) - number of codepoints (Latin a + combining circumflex is two codepoints): strlen(substitute(string, '.', 'x', 'g')) - number of spacing codepoints (Latin a + combining circumflex is one spacing codepoint; a hard tab is one; wide and narrow CJK are one each; etc.): (untested) strlen(substitute(string, '.\Z', 'x', 'g')) - virtual length (counting, for instance, tabs as anything between 1 and 'tabstop', wide CJK as 2 rather than 1, Arabic alif as zero when immediately preceded by lam, one otherwise, etc.): I guess something like what you're doing above will be necessary because of the wide range of things that can happen. The first two above are documented at :help strlen(), the third (in addition) at :help patterns-composing. Thank you, Tony, for that explanation! I've modified Align so that the method used is selectable by the user. Align v33d available at my website (http://mysite.verizon.net/astronaut/vim/index.html#ALIGN) with these changes. Regards, Chip Campbell --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: utf-8, strlen(), and virtcol()
Charles E Campbell Jr wrote: Tony Mechelynck wrote: It all depends on what exactly you want to do. (I haven't read the Align.vim docs.) The length of a UTF-8 string can be counted in several nonequivalent ways: - number of bytes (Latin a + combining circumflex is three bytes): strlen(string) - number of codepoints (Latin a + combining circumflex is two codepoints): strlen(substitute(string, '.', 'x', 'g')) - number of spacing codepoints (Latin a + combining circumflex is one spacing codepoint; a hard tab is one; wide and narrow CJK are one each; etc.): (untested) strlen(substitute(string, '.\Z', 'x', 'g')) - virtual length (counting, for instance, tabs as anything between 1 and 'tabstop', wide CJK as 2 rather than 1, Arabic alif as zero when immediately preceded by lam, one otherwise, etc.): I guess something like what you're doing above will be necessary because of the wide range of things that can happen. The first two above are documented at :help strlen(), the third (in addition) at :help patterns-composing. Thank you, Tony, for that explanation! I've modified Align so that the method used is selectable by the user. Align v33d available at my website (http://mysite.verizon.net/astronaut/vim/index.html#ALIGN) with these changes. Regards, Chip Campbell ... and, in addition, when 'fileencoding' is nonempty and different from 'encoding', the number of disk bytes used might be useful, but I don't know how Vim could get it, especially for encodings such as those used in Eastern Asia, where the number of bytes per character may vary in a way which is often not easily predictable from the UTF-8 representation. (The 2-or-4-bytes of UTF-16 is peanuts next to that, but Vim cannot use UTF-16 for its internal representation of the data because of the intervening nulls.) Best regards, Tony. -- Weiler's Law: Nothing is impossible for the man who doesn't have to do it himself. --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: utf-8, strlen(), and virtcol()
Charles E Campbell Jr schrieb: Nikolai Weibull wrote: On 9/19/07, Charles E Campbell Jr [EMAIL PROTECTED] wrote: let x='grĂ¼n' echo strlen(x)=.strlen(x) Thus, strlen() returns 5, not 4 as one might (sometimes) expect. Here's what I have in one my base library: function now#mbc#len(str) return strlen(substitute(a:str, '.', 'c', 'g')) endfunction Which is incredibly much better than your solution ;-). Well, I came up with another solution, but it still isn't as good as yours! Shouldn't strlen() just handle this on its own? With C or C++, one may be wanting to use the output of strlen() to help with allocating memory to hold a string; I don't see any of that application with Vim. Regards, Chip Campbell The multibyte strlen() is even suggested/documented here: :h strlen() -- Andy --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---