Re: [Chicken-users] Codepoint indices for matched regexps (UTF-8)?

2018-06-15 Thread John Cowan
On Fri, Jun 15, 2018 at 9:44 AM, Henry Hu  wrote:

I tried (use utf8), but it is documented that it doesn't affect irregex and
> it sure enough doesn't.  I tried using the 'utf8 option while compiling my
> regex, but it doesn't change the index returned by
> irregex-match-start-index.
>

Do "(use utf8)" and then "(import utf8-lolevel)" to get the (undocumented)
low-level utf8 API.  The function utf8-offset->index accepts a string and a
byte offset and returns a codepoint index.  If you want to go the other
way, utf8-index->offset is also provided.

-- 
John Cowan  http://vrici.lojban.org/~cowanco...@ccil.org
I don't know half of you half as well as I should like, and I like less
than half of you half as well as you deserve.  --Bilbo
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


[Chicken-users] Codepoint indices for matched regexps (UTF-8)?

2018-06-15 Thread Henry Hu
Hello world!

I am trying to use unit irregex to match regular expressions in UTF-8
text.  Is anyone familiar with a way to ask for the codepoint indices
rather than byte indices for the match?

For example:

(irregex-match-start-index (irregex-search (irregex "Č" 'utf8) "čččČččč"))

returns 6 when I want it to return 3, since there are 3 characters (6
bytes) before my match.

I tried (use utf8), but it is documented that it doesn't affect irregex and
it sure enough doesn't.  I tried using the 'utf8 option while compiling my
regex, but it doesn't change the index returned by
irregex-match-start-index.

Thank you for any ideas you might have!
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users