Bruno Haible wrote on 2000-09-25 15:07 UTC:
> > isControl  c = c < ' ' || c >= '\x7F' && c <= '\x9F'
> 
> Here I would add:  category is one of [Zl,Zp]
> because the Line/Paragraph Separators behave like LineFeed.

This depends on the environment (locale?).

In xterm and other UCS terminal emulator, LS/PS are currently treated
like any other undefined character: a default-character box is printed.

The Line/Paragraph Separators might perhaps behave like LineFeed inside
some word processors (which ones?). I very much hope that they will not
show up in UTF-8 plain text files on POSIX systems. It would break the
original ASCII compatibility of UTF-8 significantly to introduce an
alternative for LF, with security consequences as severe as the decoding
of overlong UTF-8 sequences. On POSIX applications that parse plain text
files, treating LS/PS just like like any other unassigned characters is
probably the best thing to do. In other words, all the is????()
functions should return 0.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to