Yes, this helps. Kind of ;-) ... using the character set
char-set:alphabetic, my umlauts are now parsed. But I don't get them back
in my result, at least not as printable characters. Instead, the following
happens, and utterly confuses me:

#;2> (define s3 (parse letters (string->list s)))
#;3> s3
"Gnsesger"
#;4> (string-length s3)
6
#;5> (string->list s3)
(#\G #\x4bb3 #\e #\s #\x49e5 #\r)
#;6> (list->string (string->list s3))
"G䮳es䧥r"


So, I put the parse result into 's3'. Printing it, I read an
eight character string, namely the one I want, minus my beloved umlauts.
'string-length' returns that string to be six characters long, and
'string->list' gives me exactly that, swallowing still other ASCII
characters of my string and reversing that using 'list->string' includes
Chinese ... even though '(list->string (string->list s1))', with my pure
ASCII string, reverses without fault.

I guess I have some problems understanding some utf8 concepts?!

/Christoph

On Mon, Feb 17, 2020 at 3:38 PM <ko...@upyum.com> wrote:

> Christoph Lange <christ...@clange.de> wrote:
> > meaning, that the ä isn't recognized as being a letter within the
> > 'char-set:letter'.
>
> The utf8 egg’s srfi-14 character sets are designed to be compatible with
> the original srfi-14 and only contain ASCII characters, as stated in the
> documentation:
> https://wiki.call-cc.org/eggref/5/utf8#unicode-char-sets
> “The default SRFI-14 char-sets are defined using ASCII-only characters”
>
> You might want to import the unicode-char-sets module, and use one of its
> sets, like char-set:alphabetic.
>
> I hope this helps. :)
>


-- 
Christoph Lange
Lotsarnas Väg 8
430 83 Vrångö

Reply via email to