> I am eager to try out the improved *parser that deals with Unicode
> buffers, described in release notes for "Testing release 7.7.91
> (pending)" [1]. That's why I've tried to build the code from CVS in
the
> first place.
I see. You don't need to build from CVS for that -- the Debian package
you have installed has that update already.
Here is a simple test case.
Let's say I have a text file "sample-data.txt" which is encoded in UTF-8
and contains just one line:
----------8<----------
Когато бях овчарче и овците пасях...
----------8<----------
Then executing the following simple program:
----------8<----------
(load-option '*parser)
(define full-alphabet
(code-points->alphabet (list (cons #x0 #xD7FF)
(cons #xE000 #xFFFD)
(cons #x10000 (-1+
char-code-limit)))))
(call-with-input-file "sample-data.txt"
(lambda (port)
(display ((*parser (seq (match (* (alphabet full-alphabet)))))
(input-port->parser-buffer port)))))
----------8<----------
should display:
#(Когато бях овчарче и овците пасях
)
but it displays:
#(ÐогаÑо бÑÑ
овÑаÑÑе и овÑиÑе паÑÑÑ
...
)
Which looks like UTF-8 byte sequence parsed as ISO-8859-1. Or perhaps
the data itself is parsed properly, but the `display' procedure is not
capable to handle it?
What I am doing wrong?
My locale is LANG=bg_BG.UTF-8.
--
"Every non-free program has a lord, a master --
and if you use the program, he is your master." --RMS
___________________________________________________________
Ако не отговарям на писмата Ви: http://6lyokavitza.org/mail
_______________________________________________
MIT-Scheme-users mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/mit-scheme-users