Re: [9fans] Octets regexp

erik quanstrom Thu, 02 May 2013 05:51:04 -0700

> Regexp(6) handles "characters" that are runes.

perhaps the man page is misleading.  rune in this context means utf-8.
see regexp(2).  all the functions take char*s.


> I wonder if Plan9 developers, when trying to design a way towards some
> localization, have ever thought of bytes (octets) regexp, that is using
> regexp with not rune but octets strings (maybe UTF-8 as is) allowing to
> use regexp with binary too, not only newline terminated chunks etc.?

one of the points of plan 9 was to standardize on one character set,
utf-8.  imho, localization and character set aren't related unless one
is dealing with 8859-x overlays or some other character set insufficient
to represent the range of languages.

however, sam and acme allow for structured regular expressions,
and are generally not line oriented:

http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf

and iirc, cinap has written a cifs bit that uses a bit of binary matching.

- erik

Re: [9fans] Octets regexp

Reply via email to