Re: [Chicken-users] Codepoint indices for matched regexps (UTF-8)?
On Fri, Jun 15, 2018 at 9:44 AM, Henry Hu wrote: I tried (use utf8), but it is documented that it doesn't affect irregex and > it sure enough doesn't. I tried using the 'utf8 option while compiling my > regex, but it doesn't change the index returned by > irregex-match-start-index. > Do "(use utf8)" and then "(import utf8-lolevel)" to get the (undocumented) low-level utf8 API. The function utf8-offset->index accepts a string and a byte offset and returns a codepoint index. If you want to go the other way, utf8-index->offset is also provided. -- John Cowan http://vrici.lojban.org/~cowanco...@ccil.org I don't know half of you half as well as I should like, and I like less than half of you half as well as you deserve. --Bilbo ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
[Chicken-users] Codepoint indices for matched regexps (UTF-8)?
Hello world! I am trying to use unit irregex to match regular expressions in UTF-8 text. Is anyone familiar with a way to ask for the codepoint indices rather than byte indices for the match? For example: (irregex-match-start-index (irregex-search (irregex "Č" 'utf8) "čččČččč")) returns 6 when I want it to return 3, since there are 3 characters (6 bytes) before my match. I tried (use utf8), but it is documented that it doesn't affect irregex and it sure enough doesn't. I tried using the 'utf8 option while compiling my regex, but it doesn't change the index returned by irregex-match-start-index. Thank you for any ideas you might have! ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users