Re: [libvirt] [PATCH v2 3/5] Extend nwfilter schema to accept comment attributes

Eric Blake Tue, 28 Sep 2010 12:32:39 -0700

On 09/28/2010 04:28 AM, Stefan Berger wrote:

okay.  It also leaves out 8-bit bytes - could that be a problem for i18n

where people want comments with native-language accented characters?
That is, are we being too strict here?  Maybe a better pattern would be
to reject specific non-printing ASCII bytes we want to avoid, assuing
you can use escape sequences like [^\001]?


Looking at

http://www.asciitable.com/

I should probably include 0x20-0x7E and 128-175, 224-238 - maybe even
more? So the regex then becomes

[&#x20;-&#x7E;&#128;-&#175;&#224;-&#238;]{0,256}

True ASCII is strictly 7-bit; any locale where isprint() returns true on8-bit bytes is a superset single-byte encoding, such as ISO-8859-1, or'extended ascii' from the URL you posted above. But I'm also thinkingabout multi-byte encodings, like UTF-8, where we cannot a priori write aregex that will accept all valid Unicode printable characters, in partbecause you have to look at more than one byte at a time to determine ifyou have a printable character. Which goes back to my suggestion of aninverse charset - rejecting bytes that are known to be non-printableASCII, and letting everything else whether or not it is is a printablebyte sequence in the current locale. So what about this idea: excludecontrol characters except for tab, and let space and everything afterthrough (I don't know if it needs to be adjusted to also reject &#x00):


[^&#x01;-&#x08&#x0A-&#x1F]{0,256}

--
Eric Blake   ebl...@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 3/5] Extend nwfilter schema to accept comment attributes

Reply via email to