> I am eager to try out the improved *parser that deals with Unicode > buffers, described in release notes for "Testing release 7.7.91 > (pending)" [1]. That's why I've tried to build the code from CVS in the > first place.
I see. You don't need to build from CVS for that -- the Debian package you have installed has that update already. Here is a simple test case. Let's say I have a text file "sample-data.txt" which is encoded in UTF-8 and contains just one line: ----------8<---------- Когато бях овчарче и овците пасях... ----------8<---------- Then executing the following simple program: ----------8<---------- (load-option '*parser) (define full-alphabet (code-points->alphabet (list (cons #x0 #xD7FF) (cons #xE000 #xFFFD) (cons #x10000 (-1+ char-code-limit))))) (call-with-input-file "sample-data.txt" (lambda (port) (display ((*parser (seq (match (* (alphabet full-alphabet))))) (input-port->parser-buffer port))))) ----------8<---------- should display: #(Когато бях овчарче и овците пасях ) but it displays: #(ÐогаÑо бÑÑ Ð¾Ð²ÑаÑÑе и овÑиÑе паÑÑÑ ... ) Which looks like UTF-8 byte sequence parsed as ISO-8859-1. Or perhaps the data itself is parsed properly, but the `display' procedure is not capable to handle it? What I am doing wrong? My locale is LANG=bg_BG.UTF-8. -- "Every non-free program has a lord, a master -- and if you use the program, he is your master." --RMS ___________________________________________________________ Ако не отговарям на писмата Ви: http://6lyokavitza.org/mail
_______________________________________________ MIT-Scheme-users mailing list MIT-Scheme-users@gnu.org http://lists.gnu.org/mailman/listinfo/mit-scheme-users