Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Ladislav Lhotka Thu, 24 Aug 2017 08:54:03 -0700

Per Hedeland <p...@tail-f.com> writes:

> I strongly agree with all of Juergen's statements, and disagree also
> with the suggestion to include the parts of the text that he didn't
> specifically disagree with. And I'd like to add that the "lack of XSD
> support" argument is pretty weak - there exists at least one freely
> available implementation in the form of libxml2, which is actually
> present by default in basically all "normal" Linux installations.
> It is portable C code, and the parts needed for regexp matching amount
> to just above 100 kB of compiled code on an x86_64 CPU.


I wouldn't be so strict here. Libxml2 has its share of problems - for
one, its "official" bindings do not support Python3, so e.g. in Yangson
I had to use PyXB package instead and pyang gives up pattern validation
in Python 3 entirely. 

That being said, there doesn't seem to be a clearly superior
replacement, and some aspects of XSD regexes, such as support for
Unicode and the absence of ^ and $ anchors, make a lot of sense in
YANG. So I am also not in favour of the proposed change.

BTW, it is actually a shame that there is no standard regex language
that could be easily used in all programming languages. Oh well ...

Lada

>
> --Per
>
> On 2017-08-24 08:09, Juergen Schoenwaelder wrote:
>> On Wed, Aug 23, 2017 at 09:20:36PM +0000, Xufeng Liu wrote:
>>> Members of Routing Area Yang DT have had some discussions about the 
>>> handling of various variants of regular expressions. The followings are the 
>>> current state, and we are thinking that if this topic can be added to 
>>> RFC6087bis:
>>>
>>> 1. Regular Expression Usage
>>> YANG uses regular expressions to restrict string values. Such a restriction 
>>> can be a part of a "pattern" statement or a string matching function. 
>>> [RFC7950] specifies that YANG regular expressions will conform to Appendix 
>>> F in [XSD-TYPES].
>>> YANG models have been implemented in many different environments and the 
>>> XSD variant of the regular expressions is not supported in many of these 
>>> environments. There are currently more than a dozen popular regular 
>>> expression variants implemented in various environments. While the usage of 
>>> the XSD variant of regular expression described in [RFC7950] remains the 
>>> preferred standard, a few conventions are prescribed to maximize the 
>>> portability of YANG models between environments.
>>>
>> 
>> I strongly disagree with this statement. The standard format are XSD
>> regular expressions. RFC 7950 section 9.4.5:
>> 
>>     The "pattern" statement, which is an optional substatement to the
>>     "type" statement, takes as an argument a regular expression string,
>>     as defined in [XSD-TYPES].
>> 
>> There is no notion of a 'preferred' standard.
>> 
>>> 1.1. Regular Expression Variant Choice Precedence
>>> YANG model designers SHOULD use the most portable syntax whenever possible. 
>>> Under the condition that XSD compliance is satisfied and there are multiple 
>>> choices for a given expression, the following precedence SHOULD be used to 
>>> choose a regular expressions variant:
>>>
>>> o    POSIX base
>>>
>>> o    POSIX extended
>>>
>>> o    BSD
>>>
>>> o    GNU Regular Expression Extensions
>>>
>>> o    C++ Regular Expressions with std::regex
>>>
>>> o    Others
>> 
>> Strongly disagree. You either write YANG or something different. There
>> is no way to recognize what kind of regular expressions have been used
>> by the model designer. The value of a standard is that everybody does
>> the same.
>> 
>>> For example, either \d or [0-9] can be used with equivalent semantics and 
>>> they are both compliant to [XSD-TYPES]. [0-9] is recommended because [0-9] 
>>> is supported by POSIX base but \d is not.
>>>
>>> 1.2.  Convention Guidelines
>>> 1.2.1. Avoid Character Category Escapes
>>> For example, in XSD regular expression, \d is a Character Category Escape 
>>> denoting the range of digits, i.e.,  [0-9]. To maximize portability, the 
>>> model designers SHOULD use [0-9] instead of \d.
>>>
>>> 1.2.2. Avoid Unicode Characters
>>> Unicode characters are allowed in XSD regular expressions, but are not 
>>> supported in the POSIX variant. If possible, the model designers SHOULD 
>>> avoid using Unicode characters, such as: \p{L} and \p{N}.
>>>
>>> 1.3. Conversion Tools
>>> Tools can automatically convert regular expressions from one variant to 
>>> another. When a YANG model is implemented in an environment where XSD 
>>> regular expressions are not supported, the recommended approach is to use a 
>>> conversion tool. For example, if needed, anchor position characters, i.e., 
>>> '^' and '$', can be added by a regular expression conversion tool.
>> 
>> If conversion tools exist that can convert, then by all means use XSD
>> in the YANG model and use tools to convert to whatever format your
>> implementation prefers to use.
>> 
>> /js
>> 
>
> _______________________________________________
> netmod mailing list
> netmod@ietf.org
> https://www.ietf.org/mailman/listinfo/netmod

-- 
Ladislav Lhotka
Head, CZ.NIC Labs
PGP Key ID: 0xB8F92B08A9F76C67

_______________________________________________
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

Reply via email to