[MIT-Scheme-users] *parser and UTF-8 (was: Staging problem)

Kaloian Doganov Thu, 29 Jun 2006 12:09:35 -0700

        > I am eager to try out the improved *parser that deals with Unicode
        > buffers, described in release notes for "Testing release 7.7.91
        > (pending)" [1].  That's why I've tried to build the code from CVS in 
the
        > first place.


        I see.  You don't need to build from CVS for that -- the Debian package
        you have installed has that update already.

Here is a simple test case.

Let's say I have a text file "sample-data.txt" which is encoded in UTF-8
and contains just one line:

----------8<----------
Когато бях овчарче и овците пасях...
----------8<----------

Then executing the following simple program:

----------8<----------
(load-option '*parser)

(define full-alphabet
  (code-points->alphabet (list (cons #x0 #xD7FF)
                                                           (cons #xE000 #xFFFD)
                                                           (cons #x10000 (-1+ 
char-code-limit)))))

(call-with-input-file "sample-data.txt"
  (lambda (port)
        (display ((*parser (seq (match (* (alphabet full-alphabet)))))
                          (input-port->parser-buffer port)))))
----------8<----------

should display:

   #(Когато бях овчарче и овците пасях
   )

but it displays:

   #(ÐÐ¾Ð³Ð°ÑÐ¾ Ð±ÑÑ Ð¾Ð²ÑÐ°ÑÑÐµ Ð¸ Ð¾Ð²ÑÐ¸ÑÐµ Ð¿Ð°ÑÑÑ...
   )

Which looks like UTF-8 byte sequence parsed as ISO-8859-1.  Or perhaps
the data itself is parsed properly, but the `display' procedure is not
capable to handle it?

What I am doing wrong?

My locale is LANG=bg_BG.UTF-8.


--
"Every non-free program has a lord, a master --
and if you use the program, he is your master." --RMS
___________________________________________________________
Ако не отговарям на писмата Ви: http://6lyokavitza.org/mail

_______________________________________________
MIT-Scheme-users mailing list
MIT-Scheme-users@gnu.org
http://lists.gnu.org/mailman/listinfo/mit-scheme-users

[MIT-Scheme-users] *parser and UTF-8 (was: Staging problem)

Reply via email to