Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-08 Thread Lou Berger
Kent and I discussed this.  We (as chairs) don't think there is
currently WG consensus on RegEx guidelines.  We do think there is
sufficient interest to continue the discussion, and would like to do so
both on list and in our next meeting in Singapore.

Thank you,

Lou and Kent

On 9/6/2017 1:01 PM, Lou Berger wrote:
> Thanks Rob.  I'll get with Kent and  then one of us will get back to the wg 
> on next steps.
>
> Lou
>
>
> On September 6, 2017 3:53:33 AM Robert Wilton  wrote:
>
>> Hi Lou,
>>
>> This is the addition to 6087bis that I propose.   Note, this is the same
>> text in my email on the 31st of August.
>>
>> I propose adding the following 2 paragraphs to 6087bis section on
>> pattern and ranges:
>>
>> NEW:
>> To ensure patterns are easy to read and implement, authors SHOULD
>> restrict themselves to the parts of the XML schema regular expression
>> language that are common across most regular expression languages.  In
>> particular, pattern statements SHOULD avoid using 'character class
>> subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
>> unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
>> They MAY use the '\d', '\w', '\s' character class shorthands and their
>> negated versions, where appropriate, but SHOULD avoid other character
>> class shorthands.  To match ASCII digits 0-9 the character class
>> '[0-9]' MUST be used instead of the '\d' character class shorthand
>> that matches Unicode digits in all scripts.
>>
>> Pattern statements do not have to strictly restrict numerical values,
>> and a simple less specific pattern may be preferable over a more
>> complex and precise pattern, e.g. as illustrated in the
>> 'ipv4-address-no-zone' example pattern below.
>>
>>
>> Or, put in context of the existing text 6087bis text:
>>
>> *** Patterns and Ranges
>>
>> For string data types, if a machine-readable pattern
>> can be defined for the desired semantics, then
>> one or more pattern statements SHOULD be present.
>> A single quoted string SHOULD be used to specify the pattern,
>> since a double-quoted string can modify the content.
>>
>> To ensure patterns are easy to read and implement, authors SHOULD
>> restrict themselves to the parts of the XML schema regular expression
>> language that are common across most regular expression languages.  In
>> particular, pattern statements SHOULD avoid using 'character class
>> subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
>> unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
>> They MAY use the '\d', '\w', '\s' character class shorthands and their
>> negated versions, where appropriate, but SHOULD avoid other character
>> class shorthands.  To match ASCII digits 0-9 the character class
>> '[0-9]' MUST be used instead of the '\d' character class shorthand
>> that also matches Unicode digits in all scripts.
>>
>> Pattern statements do not have to strictly restrict numerical values,
>> and a simple less specific pattern may be preferable over a more
>> complex and precise pattern, e.g. as illustrated in the
>> 'ipv4-address-no-zone' example pattern below.
>>
>> The following typedef from ^RFC6991^ demonstrates the proper
>> use of the "pattern" statement:
>>
>>      typedef ipv4-address-no-zone {
>>    type inet:ipv4-address {
>>      pattern '[0-9\.]*';
>>    }
>>    ...
>>      }
>>
>> For string data types, if the length of the string
>> is required to be bounded in all implementations,
>> then a length statement MUST be present.
>>
>> The following typedef from ^RFC6991^ demonstrates the proper
>> use of the "length" statement:
>>
>>      typedef yang-identifier {
>>    type string {
>>      length "1..max";
>>      pattern '[a-zA-Z_][a-zA-Z0-9\-_.]*';
>>      pattern '.|..|[^xX].*|.[^mM].*|..[^lL].*';
>>    }
>>    ...
>>      }
>>
>> For numeric data types, if the values allowed
>> by the intended semantics are different than
>> those allowed by the unbounded intrinsic data
>> type (e.g., 'int32'), then a range statement SHOULD be present.
>>
>> The following typedef from ^RFC6991^ demonstrates the proper
>> use of the "range" statement:
>>
>>      typedef dscp {
>>    type uint8 {
>>   range "0..63";
>>    }
>>    ...
>>      }
>>
>> Thanks,
>> Rob
>>
>>
>> On 05/09/2017 22:37, Lou Berger wrote:
>>> Rob,
>>>
>>> (as chair)
>>> On 9/5/2017 1:17 PM, Robert Wilton wrote:
 However, I have thrown in the towel on my regex crusade.
>>> I'm sorry, I've lost the thread here a bit. in order to guage consensus
>>> on this topic, it would be helpful to send the latest text that you are
>>> proposing for inclusion in the the bis.  If you are willing to do these,
>>> we can poll to see if there is/is not support for inclusion of this
>>> text.  Are you willing, i.e., can you send the current proposed text change?
>>>
>>> Thank you,
>>> Lou
>>>
>>> .
>>>
>>

___
netmod mailing list
netmod@ietf.or

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-06 Thread Andy Bierman
On Wed, Sep 6, 2017 at 2:16 AM, Robert Wilton  wrote:

>
>
> On 05/09/2017 19:00, Juergen Schoenwaelder wrote:
>
>> On Tue, Sep 05, 2017 at 06:17:09PM +0100, Robert Wilton wrote:
>>
>>> I believe that tools intended for general use should follow the YANG spec
 literally.

>>> I don't fully agree.  I think that they only need to cover the parts of
>>> the
>>> YANG spec for the models that they are using (or might use). If nobody
>>> uses
>>> Unicode blocks then it doesn't really matter whether a given tool
>>> supports
>>> them or not.  It is always possible to caveat and add support for the
>>> missing bits later.  E.g. if I was writing a bespoke XPATH implementation
>>> for YANG then there is probably quite a lot of the XPATH spec that I
>>> would
>>> also leave out as well, and just concentrate on the parts that people
>>> actually use, or are likely to use.
>>>
>>> If this is your understanding of standards, why do you want to define
>> a subset of XSD pattern based on the your observation what is used or
>> not used? Simply do not implement what you observe is not used. Why do
>> we need guidelines of constructs not to use so that they are not used?
>>
> My aims:
> 1) To make pattern statements in standard YANG models easier to comprehend.
> 2) So that implementations designed to only support standard YANG models
> can have more confidence that they don't need to support all of the Unciode
> properties and character blocks.
>
>

I do not agree that goal (1) is achieved by limited the usage of the
pattern expression language.
IMO it is important to achieve the full interoperability that is possible
between tools
that conform to the pattern definition language.  This is true whether the
language is XSD or some flavor
of Posix or whatever.  It is valuable for readers, writers, and tool-makers
to know that all
tools that conform to the standard use the same pattern expression language.

I do not agree that (2) should be a goal of the standard.
If tool-makers have a false expectation that they can use the parser for
any pattern
expression language, then tools will be fragile. The


Andy



>> There are multiple contradictions in your posts, one of them was the
>> idea of translating unicode matching to ASCII - which simply does not
>> work.
>>
> This does work if your implementation is willing to be restricted to only
> supporting ASCII.  Some users of YANG seem to think that ASCII is
> sufficient to configure and manage network devices.  My person opinion is
> that they are probably broadly right, but there are some places where
> supporting a unicode string is better (e.g. the interface description
> leaf).  However, in these cases I think that either no pattern statement is
> required, or otherwise \w,\s,\d are probably sufficient.
>
> I understand, and agree, that an implementation that restricts pattern
> statement support to only ASCII strings makes the implementation non
> compliant to the YANG spec.
>
>
>   Or the post where you said \d is OK but then later said \d is
>> not OK since it translates to a large number of numeric characters.
>>
> Yes, my opinion changed when I found our that '\d' covers more than just
> ASCII.  As per the 6087bis text that I sent out, I think that '\d' can be
> used, but must not be used if the regex is meant to only match ASCII 0 to
> 9.  My concern is that many readers/authors/implementors of YANG models may
> not understand properly understand that '\d' also covers digits in other
> unicode scripts, and hence I think that it is more clear (and hence better)
> to use '[0-9]' in pattern statements instead, since the interpretation of
> that is entirely unambiguous.
>
>
> You really need to sort out what you want, what the problem is you are
>> trying to solve, how you select the subset of XSD pattern etc. Write
>> and I-D.
>>
> Do you think that writing an I-D, that would contain the same arguments
> that I've presented here, would sway your opinion at all?
>
> My assumption is that it wouldn't and hence writing up an ID would seem to
> be a waste of effort.
>
>   And at the end, people who only do POSIX regular expressions,
>> because they come with the standard C library on POSIX systems or
>> whatever the reason really is, still will either have to continue to
>> cheat by silently interpreting XSD pattern as POSIX pattern or they
>> create a proper new statement to at least properly distinguish
>> different pattern languages.
>>
> Sure, but I don't regard either of these as good long term solutions.
>
> Thanks,
> Rob
>
>
>> /js
>>
>>
> ___
> netmod mailing list
> netmod@ietf.org
> https://www.ietf.org/mailman/listinfo/netmod
>
___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-06 Thread Lou Berger
Thanks Rob.  I'll get with Kent and  then one of us will get back to the wg 
on next steps.


Lou


On September 6, 2017 3:53:33 AM Robert Wilton  wrote:


Hi Lou,

This is the addition to 6087bis that I propose.   Note, this is the same
text in my email on the 31st of August.

I propose adding the following 2 paragraphs to 6087bis section on
pattern and ranges:

NEW:
To ensure patterns are easy to read and implement, authors SHOULD
restrict themselves to the parts of the XML schema regular expression
language that are common across most regular expression languages.  In
particular, pattern statements SHOULD avoid using 'character class
subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
They MAY use the '\d', '\w', '\s' character class shorthands and their
negated versions, where appropriate, but SHOULD avoid other character
class shorthands.  To match ASCII digits 0-9 the character class
'[0-9]' MUST be used instead of the '\d' character class shorthand
that matches Unicode digits in all scripts.

Pattern statements do not have to strictly restrict numerical values,
and a simple less specific pattern may be preferable over a more
complex and precise pattern, e.g. as illustrated in the
'ipv4-address-no-zone' example pattern below.


Or, put in context of the existing text 6087bis text:

*** Patterns and Ranges

For string data types, if a machine-readable pattern
can be defined for the desired semantics, then
one or more pattern statements SHOULD be present.
A single quoted string SHOULD be used to specify the pattern,
since a double-quoted string can modify the content.

To ensure patterns are easy to read and implement, authors SHOULD
restrict themselves to the parts of the XML schema regular expression
language that are common across most regular expression languages.  In
particular, pattern statements SHOULD avoid using 'character class
subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
They MAY use the '\d', '\w', '\s' character class shorthands and their
negated versions, where appropriate, but SHOULD avoid other character
class shorthands.  To match ASCII digits 0-9 the character class
'[0-9]' MUST be used instead of the '\d' character class shorthand
that also matches Unicode digits in all scripts.

Pattern statements do not have to strictly restrict numerical values,
and a simple less specific pattern may be preferable over a more
complex and precise pattern, e.g. as illustrated in the
'ipv4-address-no-zone' example pattern below.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "pattern" statement:

     typedef ipv4-address-no-zone {
   type inet:ipv4-address {
     pattern '[0-9\.]*';
   }
   ...
     }

For string data types, if the length of the string
is required to be bounded in all implementations,
then a length statement MUST be present.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "length" statement:

     typedef yang-identifier {
   type string {
     length "1..max";
     pattern '[a-zA-Z_][a-zA-Z0-9\-_.]*';
     pattern '.|..|[^xX].*|.[^mM].*|..[^lL].*';
   }
   ...
     }

For numeric data types, if the values allowed
by the intended semantics are different than
those allowed by the unbounded intrinsic data
type (e.g., 'int32'), then a range statement SHOULD be present.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "range" statement:

     typedef dscp {
   type uint8 {
  range "0..63";
   }
   ...
     }

Thanks,
Rob


On 05/09/2017 22:37, Lou Berger wrote:

Rob,

(as chair)
On 9/5/2017 1:17 PM, Robert Wilton wrote:

However, I have thrown in the towel on my regex crusade.

I'm sorry, I've lost the thread here a bit. in order to guage consensus
on this topic, it would be helpful to send the latest text that you are
proposing for inclusion in the the bis.  If you are willing to do these,
we can poll to see if there is/is not support for inclusion of this
text.  Are you willing, i.e., can you send the current proposed text change?

Thank you,
Lou

.







___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-06 Thread Robert Wilton



On 05/09/2017 19:00, Juergen Schoenwaelder wrote:

On Tue, Sep 05, 2017 at 06:17:09PM +0100, Robert Wilton wrote:

I believe that tools intended for general use should follow the YANG spec
literally.

I don't fully agree.  I think that they only need to cover the parts of the
YANG spec for the models that they are using (or might use). If nobody uses
Unicode blocks then it doesn't really matter whether a given tool supports
them or not.  It is always possible to caveat and add support for the
missing bits later.  E.g. if I was writing a bespoke XPATH implementation
for YANG then there is probably quite a lot of the XPATH spec that I would
also leave out as well, and just concentrate on the parts that people
actually use, or are likely to use.


If this is your understanding of standards, why do you want to define
a subset of XSD pattern based on the your observation what is used or
not used? Simply do not implement what you observe is not used. Why do
we need guidelines of constructs not to use so that they are not used?

My aims:
1) To make pattern statements in standard YANG models easier to comprehend.
2) So that implementations designed to only support standard YANG models 
can have more confidence that they don't need to support all of the 
Unciode properties and character blocks.




There are multiple contradictions in your posts, one of them was the
idea of translating unicode matching to ASCII - which simply does not
work.
This does work if your implementation is willing to be restricted to 
only supporting ASCII.  Some users of YANG seem to think that ASCII is 
sufficient to configure and manage network devices.  My person opinion 
is that they are probably broadly right, but there are some places where 
supporting a unicode string is better (e.g. the interface description 
leaf).  However, in these cases I think that either no pattern statement 
is required, or otherwise \w,\s,\d are probably sufficient.


I understand, and agree, that an implementation that restricts pattern 
statement support to only ASCII strings makes the implementation non 
compliant to the YANG spec.




  Or the post where you said \d is OK but then later said \d is
not OK since it translates to a large number of numeric characters.
Yes, my opinion changed when I found our that '\d' covers more than just 
ASCII.  As per the 6087bis text that I sent out, I think that '\d' can 
be used, but must not be used if the regex is meant to only match ASCII 
0 to 9.  My concern is that many readers/authors/implementors of YANG 
models may not understand properly understand that '\d' also covers 
digits in other unicode scripts, and hence I think that it is more clear 
(and hence better) to use '[0-9]' in pattern statements instead, since 
the interpretation of that is entirely unambiguous.




You really need to sort out what you want, what the problem is you are
trying to solve, how you select the subset of XSD pattern etc. Write
and I-D.
Do you think that writing an I-D, that would contain the same arguments 
that I've presented here, would sway your opinion at all?


My assumption is that it wouldn't and hence writing up an ID would seem 
to be a waste of effort.



  And at the end, people who only do POSIX regular expressions,
because they come with the standard C library on POSIX systems or
whatever the reason really is, still will either have to continue to
cheat by silently interpreting XSD pattern as POSIX pattern or they
create a proper new statement to at least properly distinguish
different pattern languages.

Sure, but I don't regard either of these as good long term solutions.

Thanks,
Rob



/js



___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-06 Thread Robert Wilton



On 06/09/2017 09:33, Ladislav Lhotka wrote:

Robert Wilton píše v St 06. 09. 2017 v 08:52 +0100:

Hi Lou,

This is the addition to 6087bis that I propose.   Note, this is the same
text in my email on the 31st of August.

I propose adding the following 2 paragraphs to 6087bis section on
pattern and ranges:

NEW:
To ensure patterns are easy to read and implement, authors SHOULD
restrict themselves to the parts of the XML schema regular expression
language that are common across most regular expression languages.  In
particular, pattern statements SHOULD avoid using 'character class
subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
They MAY use the '\d', '\w', '\s' character class shorthands and their
negated versions, where appropriate, but SHOULD avoid other character
class shorthands.  To match ASCII digits 0-9 the character class

I don't agree, things like \p{L} may be useful, at least in this part of the
world.

Moreover, \w means "any Unicode character not defined as punctuation, separator,
or other" in YANG, but it may mean something else in a programming language,
perhaps also depending on locale setting. This is a slippery slope, developers
should not assume they can take a regex from YANG, enclose it in ^..$ and then
feed into a RE-matching function.

For clarity:

I am not suggesting that they pass the pattern statement directly into 
the local regex engine.  Instead, I am suggesting that they convert it 
first (which may involve expanding \w to the equivalent XSD RE character 
class) before feeding it into the RE-matching function.  Whatever RE 
engine is used to interpret the pattern statement, the result must be 
the same as if an XSD RE engine was used (as defined by the YANG spec).


Thanks,
Rob




Lada


'[0-9]' MUST be used instead of the '\d' character class shorthand
that matches Unicode digits in all scripts.

Pattern statements do not have to strictly restrict numerical values,
and a simple less specific pattern may be preferable over a more
complex and precise pattern, e.g. as illustrated in the
'ipv4-address-no-zone' example pattern below.


Or, put in context of the existing text 6087bis text:

*** Patterns and Ranges

For string data types, if a machine-readable pattern
can be defined for the desired semantics, then
one or more pattern statements SHOULD be present.
A single quoted string SHOULD be used to specify the pattern,
since a double-quoted string can modify the content.

To ensure patterns are easy to read and implement, authors SHOULD
restrict themselves to the parts of the XML schema regular expression
language that are common across most regular expression languages.  In
particular, pattern statements SHOULD avoid using 'character class
subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
They MAY use the '\d', '\w', '\s' character class shorthands and their
negated versions, where appropriate, but SHOULD avoid other character
class shorthands.  To match ASCII digits 0-9 the character class
'[0-9]' MUST be used instead of the '\d' character class shorthand
that also matches Unicode digits in all scripts.

Pattern statements do not have to strictly restrict numerical values,
and a simple less specific pattern may be preferable over a more
complex and precise pattern, e.g. as illustrated in the
'ipv4-address-no-zone' example pattern below.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "pattern" statement:

  typedef ipv4-address-no-zone {
type inet:ipv4-address {
  pattern '[0-9\.]*';
}
...
  }

For string data types, if the length of the string
is required to be bounded in all implementations,
then a length statement MUST be present.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "length" statement:

  typedef yang-identifier {
type string {
  length "1..max";
  pattern '[a-zA-Z_][a-zA-Z0-9\-_.]*';
  pattern '.|..|[^xX].*|.[^mM].*|..[^lL].*';
}
...
  }

For numeric data types, if the values allowed
by the intended semantics are different than
those allowed by the unbounded intrinsic data
type (e.g., 'int32'), then a range statement SHOULD be present.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "range" statement:

  typedef dscp {
type uint8 {
   range "0..63";
}
...
  }

Thanks,
Rob


On 05/09/2017 22:37, Lou Berger wrote:

Rob,

(as chair)
On 9/5/2017 1:17 PM, Robert Wilton wrote:

However, I have thrown in the towel on my regex crusade.

I'm sorry, I've lost the thread here a bit. in order to guage consensus
on this topic, it would be helpful to send the latest text that you are
proposing for inclusion in the the bis.  If you are willing to do these,
we can poll to see if there is/is not support

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-06 Thread Ladislav Lhotka
Robert Wilton píše v St 06. 09. 2017 v 08:52 +0100:
> Hi Lou,
> 
> This is the addition to 6087bis that I propose.   Note, this is the same 
> text in my email on the 31st of August.
> 
> I propose adding the following 2 paragraphs to 6087bis section on 
> pattern and ranges:
> 
> NEW:
> To ensure patterns are easy to read and implement, authors SHOULD
> restrict themselves to the parts of the XML schema regular expression
> language that are common across most regular expression languages.  In
> particular, pattern statements SHOULD avoid using 'character class
> subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
> unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
> They MAY use the '\d', '\w', '\s' character class shorthands and their
> negated versions, where appropriate, but SHOULD avoid other character
> class shorthands.  To match ASCII digits 0-9 the character class

I don't agree, things like \p{L} may be useful, at least in this part of the
world.

Moreover, \w means "any Unicode character not defined as punctuation, separator,
or other" in YANG, but it may mean something else in a programming language,
perhaps also depending on locale setting. This is a slippery slope, developers
should not assume they can take a regex from YANG, enclose it in ^..$ and then
feed into a RE-matching function.

Lada

> '[0-9]' MUST be used instead of the '\d' character class shorthand
> that matches Unicode digits in all scripts.
> 
> Pattern statements do not have to strictly restrict numerical values,
> and a simple less specific pattern may be preferable over a more
> complex and precise pattern, e.g. as illustrated in the
> 'ipv4-address-no-zone' example pattern below.
> 
> 
> Or, put in context of the existing text 6087bis text:
> 
> *** Patterns and Ranges
> 
> For string data types, if a machine-readable pattern
> can be defined for the desired semantics, then
> one or more pattern statements SHOULD be present.
> A single quoted string SHOULD be used to specify the pattern,
> since a double-quoted string can modify the content.
> 
> To ensure patterns are easy to read and implement, authors SHOULD
> restrict themselves to the parts of the XML schema regular expression
> language that are common across most regular expression languages.  In
> particular, pattern statements SHOULD avoid using 'character class
> subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
> unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
> They MAY use the '\d', '\w', '\s' character class shorthands and their
> negated versions, where appropriate, but SHOULD avoid other character
> class shorthands.  To match ASCII digits 0-9 the character class
> '[0-9]' MUST be used instead of the '\d' character class shorthand
> that also matches Unicode digits in all scripts.
> 
> Pattern statements do not have to strictly restrict numerical values,
> and a simple less specific pattern may be preferable over a more
> complex and precise pattern, e.g. as illustrated in the
> 'ipv4-address-no-zone' example pattern below.
> 
> The following typedef from ^RFC6991^ demonstrates the proper
> use of the "pattern" statement:
> 
>  typedef ipv4-address-no-zone {
>type inet:ipv4-address {
>  pattern '[0-9\.]*';
>}
>...
>  }
> 
> For string data types, if the length of the string
> is required to be bounded in all implementations,
> then a length statement MUST be present.
> 
> The following typedef from ^RFC6991^ demonstrates the proper
> use of the "length" statement:
> 
>  typedef yang-identifier {
>type string {
>  length "1..max";
>  pattern '[a-zA-Z_][a-zA-Z0-9\-_.]*';
>  pattern '.|..|[^xX].*|.[^mM].*|..[^lL].*';
>}
>...
>  }
> 
> For numeric data types, if the values allowed
> by the intended semantics are different than
> those allowed by the unbounded intrinsic data
> type (e.g., 'int32'), then a range statement SHOULD be present.
> 
> The following typedef from ^RFC6991^ demonstrates the proper
> use of the "range" statement:
> 
>  typedef dscp {
>type uint8 {
>   range "0..63";
>}
>...
>  }
> 
> Thanks,
> Rob
> 
> 
> On 05/09/2017 22:37, Lou Berger wrote:
> > Rob,
> > 
> > (as chair)
> > On 9/5/2017 1:17 PM, Robert Wilton wrote:
> > > However, I have thrown in the towel on my regex crusade.
> > 
> > I'm sorry, I've lost the thread here a bit. in order to guage consensus
> > on this topic, it would be helpful to send the latest text that you are
> > proposing for inclusion in the the bis.  If you are willing to do these,
> > we can poll to see if there is/is not support for inclusion of this
> > text.  Are you willing, i.e., can you send the current proposed text change?
> > 
> > Thank you,
> > Lou
> > 
> > .
> > 
> 
> 
-- 
Ladislav Lhotka
Head, CZ.NIC Labs
PGP Key ID: 0xB8F92B08A9F76C67

___
netmod mailing list

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-06 Thread Robert Wilton

Hi Lou,

This is the addition to 6087bis that I propose.   Note, this is the same 
text in my email on the 31st of August.


I propose adding the following 2 paragraphs to 6087bis section on 
pattern and ranges:


NEW:
To ensure patterns are easy to read and implement, authors SHOULD
restrict themselves to the parts of the XML schema regular expression
language that are common across most regular expression languages.  In
particular, pattern statements SHOULD avoid using 'character class
subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
They MAY use the '\d', '\w', '\s' character class shorthands and their
negated versions, where appropriate, but SHOULD avoid other character
class shorthands.  To match ASCII digits 0-9 the character class
'[0-9]' MUST be used instead of the '\d' character class shorthand
that matches Unicode digits in all scripts.

Pattern statements do not have to strictly restrict numerical values,
and a simple less specific pattern may be preferable over a more
complex and precise pattern, e.g. as illustrated in the
'ipv4-address-no-zone' example pattern below.


Or, put in context of the existing text 6087bis text:

*** Patterns and Ranges

For string data types, if a machine-readable pattern
can be defined for the desired semantics, then
one or more pattern statements SHOULD be present.
A single quoted string SHOULD be used to specify the pattern,
since a double-quoted string can modify the content.

To ensure patterns are easy to read and implement, authors SHOULD
restrict themselves to the parts of the XML schema regular expression
language that are common across most regular expression languages.  In
particular, pattern statements SHOULD avoid using 'character class
subtraction' (e.g. '[a-z-[aeiou]]').  They SHOULD avoid matching
unicode properties and blocks (e.g. '\p{L} or \p{IsBasic_Latin}').
They MAY use the '\d', '\w', '\s' character class shorthands and their
negated versions, where appropriate, but SHOULD avoid other character
class shorthands.  To match ASCII digits 0-9 the character class
'[0-9]' MUST be used instead of the '\d' character class shorthand
that also matches Unicode digits in all scripts.

Pattern statements do not have to strictly restrict numerical values,
and a simple less specific pattern may be preferable over a more
complex and precise pattern, e.g. as illustrated in the
'ipv4-address-no-zone' example pattern below.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "pattern" statement:

    typedef ipv4-address-no-zone {
  type inet:ipv4-address {
    pattern '[0-9\.]*';
  }
  ...
    }

For string data types, if the length of the string
is required to be bounded in all implementations,
then a length statement MUST be present.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "length" statement:

    typedef yang-identifier {
  type string {
    length "1..max";
    pattern '[a-zA-Z_][a-zA-Z0-9\-_.]*';
    pattern '.|..|[^xX].*|.[^mM].*|..[^lL].*';
  }
  ...
    }

For numeric data types, if the values allowed
by the intended semantics are different than
those allowed by the unbounded intrinsic data
type (e.g., 'int32'), then a range statement SHOULD be present.

The following typedef from ^RFC6991^ demonstrates the proper
use of the "range" statement:

    typedef dscp {
  type uint8 {
 range "0..63";
  }
  ...
    }

Thanks,
Rob


On 05/09/2017 22:37, Lou Berger wrote:

Rob,

(as chair)
On 9/5/2017 1:17 PM, Robert Wilton wrote:

However, I have thrown in the towel on my regex crusade.

I'm sorry, I've lost the thread here a bit. in order to guage consensus
on this topic, it would be helpful to send the latest text that you are
proposing for inclusion in the the bis.  If you are willing to do these,
we can poll to see if there is/is not support for inclusion of this
text.  Are you willing, i.e., can you send the current proposed text change?

Thank you,
Lou

.



___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-06 Thread Ladislav Lhotka
Juergen Schoenwaelder píše v Út 05. 09. 2017 v 20:00 +0200:
> On Tue, Sep 05, 2017 at 06:17:09PM +0100, Robert Wilton wrote:
> > 
> > > I believe that tools intended for general use should follow the YANG spec
> > > literally.
> > 
> > I don't fully agree.  I think that they only need to cover the parts of the
> > YANG spec for the models that they are using (or might use). If nobody uses
> > Unicode blocks then it doesn't really matter whether a given tool supports
> > them or not.  It is always possible to caveat and add support for the
> > missing bits later.  E.g. if I was writing a bespoke XPATH implementation
> > for YANG then there is probably quite a lot of the XPATH spec that I would
> > also leave out as well, and just concentrate on the parts that people
> > actually use, or are likely to use.
> > 
> 
> If this is your understanding of standards, why do you want to define
> a subset of XSD pattern based on the your observation what is used or
> not used? Simply do not implement what you observe is not used. Why do
> we need guidelines of constructs not to use so that they are not used?

Agreed. Also, XPath is different: due to the restrictions imposed on data trees
modelled with YANG compared to general XML (no mixed content, no document order)
som XPath features don't make sense by definition. This is not the case for
Unicode strings.

Lada

> 
> There are multiple contradictions in your posts, one of them was the
> idea of translating unicode matching to ASCII - which simply does not
> work. Or the post where you said \d is OK but then later said \d is
> not OK since it translates to a large number of numeric characters.
> You really need to sort out what you want, what the problem is you are
> trying to solve, how you select the subset of XSD pattern etc. Write
> and I-D. And at the end, people who only do POSIX regular expressions,
> because they come with the standard C library on POSIX systems or
> whatever the reason really is, still will either have to continue to
> cheat by silently interpreting XSD pattern as POSIX pattern or they
> create a proper new statement to at least properly distinguish
> different pattern languages.
> 
> /js
> 
-- 
Ladislav Lhotka
Head, CZ.NIC Labs
PGP Key ID: 0xB8F92B08A9F76C67

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-05 Thread Lou Berger
Rob,

(as chair)
On 9/5/2017 1:17 PM, Robert Wilton wrote:
> However, I have thrown in the towel on my regex crusade.

I'm sorry, I've lost the thread here a bit. in order to guage consensus
on this topic, it would be helpful to send the latest text that you are
proposing for inclusion in the the bis.  If you are willing to do these,
we can poll to see if there is/is not support for inclusion of this
text.  Are you willing, i.e., can you send the current proposed text change?

Thank you,
Lou

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-05 Thread Juergen Schoenwaelder
On Tue, Sep 05, 2017 at 06:17:09PM +0100, Robert Wilton wrote:
> 
> > I believe that tools intended for general use should follow the YANG spec
> > literally.
>
> I don't fully agree.  I think that they only need to cover the parts of the
> YANG spec for the models that they are using (or might use). If nobody uses
> Unicode blocks then it doesn't really matter whether a given tool supports
> them or not.  It is always possible to caveat and add support for the
> missing bits later.  E.g. if I was writing a bespoke XPATH implementation
> for YANG then there is probably quite a lot of the XPATH spec that I would
> also leave out as well, and just concentrate on the parts that people
> actually use, or are likely to use.
>

If this is your understanding of standards, why do you want to define
a subset of XSD pattern based on the your observation what is used or
not used? Simply do not implement what you observe is not used. Why do
we need guidelines of constructs not to use so that they are not used?

There are multiple contradictions in your posts, one of them was the
idea of translating unicode matching to ASCII - which simply does not
work. Or the post where you said \d is OK but then later said \d is
not OK since it translates to a large number of numeric characters.
You really need to sort out what you want, what the problem is you are
trying to solve, how you select the subset of XSD pattern etc. Write
and I-D. And at the end, people who only do POSIX regular expressions,
because they come with the standard C library on POSIX systems or
whatever the reason really is, still will either have to continue to
cheat by silently interpreting XSD pattern as POSIX pattern or they
create a proper new statement to at least properly distinguish
different pattern languages.

/js

-- 
Juergen Schoenwaelder   Jacobs University Bremen gGmbH
Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103 

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-05 Thread Robert Wilton



On 05/09/2017 17:35, Ladislav Lhotka wrote:

Robert Wilton píše v Po 04. 09. 2017 v 17:07 +0100:

Hi Lada,

On 04/09/2017 15:59, Ladislav Lhotka wrote:

Robert Wilton píše v Po 04. 09. 2017 v 15:05 +0100:

Hi Andy,

On 02/09/2017 17:46, Andy Bierman wrote:

On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder  wrote:

On Sat, Sep 02, 2017 at 10:39:57AM +, Acee Lindem (acee) wrote:

This is not an effort to change or bifurcate the YANG 1.1. It is
simply to
RECOMMEND a proper subset of XSD pattern that is more portable.


If you implement YANG as it is defined, pattern are portable. Given
this, I do not understand the notion of 'more portable'.

Anyway, it seems that those who want a more portable subset do not
even agree on what that subset is. Perhaps people pushing for this
should go and write an I-D that explains why a 'more portable' subset
is needed (which problems are we fixing), that defines such a 'more
portable subset', and which includes the reasoning how the subset has
been determined.



I do not agree that the YANG pattern contains a string that is both a
POSIX and XSD regular expression.
The RFC is very clear it contains an XSD expression. Pretending it is
both is a hack that does not even seem
to work 100%, so it is not reliable.

   I am not suggesting that the YANG pattern is both a POSIX and XSD
regular expression.

I am only suggesting that the guidelines recommend that authors use a
subset of XSD, to make it easier to programmatically *convert* the 'XSD
subset compliant regular expression' into a functionally equivalent
regular expression for whatever regular expression engine the tooling
decides to use.

And that's the point, I think: each developer needs to get a library
function so
as to translate the XSD pattern into a native regex of whatever programming
language he/she is currently using. So I guess what we really need is to
identify libraries for common languages that do it correctly - or write
simple
translators ourselves if none is available.

Yes, exactly.

Looking at http://www.regular-expressions.info/ then XML RE does look
like a good standard choice of RE language for YANG pattern statements
because it is generally one of the most basic RE languages, and hence it
should be feasible to convert an XML RE into a form usable by most RE
languages.

Yes, and the XSD RE language was also designed for pretty much the same purpose
(data type system).


But converting some parts of the XML RE syntax would probably be laborious:

Unicode support is of course hairy but since YANG permits it in the string type
it makes sense that the pattern language follow suit.

RE flavours used in modern programming languages support Unicode, so the
translation should be doable (if it hasn't been done yet).
Yes. POSIX extended regex (that one proposed by OpenConfig) is the odd 
one out here because it doesn't support unicode.


Still I haven't seen any standards based RFC or IETF draft YANG models 
that need to match either unicode properties or blocks.  The IPv4/v6 
zone address uses them, but I suspect that '\w' would have been sufficient.





1) E.g. the unicode property '\p{Nd}' that is equivalent to '\d' matches
590 characters
(http://www.fileformat.info/info/unicode/category/Nd/list.htm). There
are approx 32 unicode properties, presumably these could also be
extended over time as well.
2) There are currently 105 unicode blocks, which each block is a
discrete range of characters (e.g. \p{InTibetan}: U+0F00–U+0FFF)
3) Handling the character class subtraction is also possible, but
probably tedious to implement, since it requires the translation to
fully understand the set of characters in the character class so it can
form an equivalent character class without any subtractions.

But now with the "invert-match" modifier in YANG 1.1, implementations have to be
able to perform such set differences anyway, right?
No.  Character class subtraction applies to a single character class 
match in the expression.  The "invert-match" applies to the whole regex 
check.  The same regex check can be performed and the boolean result 
reversed.





These were the three parts of the XML RE that I was hoping to discourage
in the YANG author guidelines so that performing a translation is much
easier.  Spotting these 3 parts in the regex should be simple, so the
translation would still be robust, even if not complete.

I believe that tools intended for general use should follow the YANG spec
literally.
I don't fully agree.  I think that they only need to cover the parts of 
the YANG spec for the models that they are using (or might use). If 
nobody uses Unicode blocks then it doesn't really matter whether a given 
tool supports them or not.  It is always possible to caveat and add 
support for the missing bits later.  E.g. if I was writing a bespoke 
XPATH implementation for YANG then there is probably quite a lot of the 
XPATH spec that I would also leave out as well, and just concentrate on 
the parts that

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-05 Thread Ladislav Lhotka
Robert Wilton píše v Po 04. 09. 2017 v 17:07 +0100:
> Hi Lada,
> 
> On 04/09/2017 15:59, Ladislav Lhotka wrote:
> > Robert Wilton píše v Po 04. 09. 2017 v 15:05 +0100:
> > > Hi Andy,
> > > 
> > > On 02/09/2017 17:46, Andy Bierman wrote:
> > > > On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder  > > > acobs-university.de> wrote:
> > > > > On Sat, Sep 02, 2017 at 10:39:57AM +, Acee Lindem (acee) wrote:
> > > > > > This is not an effort to change or bifurcate the YANG 1.1. It is
> > > > > > simply to
> > > > > > RECOMMEND a proper subset of XSD pattern that is more portable.
> > > > > > 
> > > > > 
> > > > > If you implement YANG as it is defined, pattern are portable. Given
> > > > > this, I do not understand the notion of 'more portable'.
> > > > > 
> > > > > Anyway, it seems that those who want a more portable subset do not
> > > > > even agree on what that subset is. Perhaps people pushing for this
> > > > > should go and write an I-D that explains why a 'more portable' subset
> > > > > is needed (which problems are we fixing), that defines such a 'more
> > > > > portable subset', and which includes the reasoning how the subset has
> > > > > been determined.
> > > > > 
> > > > > 
> > > > 
> > > > I do not agree that the YANG pattern contains a string that is both a
> > > > POSIX and XSD regular expression.
> > > > The RFC is very clear it contains an XSD expression. Pretending it is
> > > > both is a hack that does not even seem
> > > > to work 100%, so it is not reliable.
> > > 
> > >   I am not suggesting that the YANG pattern is both a POSIX and XSD
> > > regular expression.
> > > 
> > > I am only suggesting that the guidelines recommend that authors use a
> > > subset of XSD, to make it easier to programmatically *convert* the 'XSD
> > > subset compliant regular expression' into a functionally equivalent
> > > regular expression for whatever regular expression engine the tooling
> > > decides to use.
> > 
> > And that's the point, I think: each developer needs to get a library
> > function so
> > as to translate the XSD pattern into a native regex of whatever programming
> > language he/she is currently using. So I guess what we really need is to
> > identify libraries for common languages that do it correctly - or write
> > simple
> > translators ourselves if none is available.
> 
> Yes, exactly.
> 
> Looking at http://www.regular-expressions.info/ then XML RE does look 
> like a good standard choice of RE language for YANG pattern statements 
> because it is generally one of the most basic RE languages, and hence it 
> should be feasible to convert an XML RE into a form usable by most RE 
> languages.

Yes, and the XSD RE language was also designed for pretty much the same purpose
(data type system).  

> 
> But converting some parts of the XML RE syntax would probably be laborious:

Unicode support is of course hairy but since YANG permits it in the string type
it makes sense that the pattern language follow suit. 

RE flavours used in modern programming languages support Unicode, so the
translation should be doable (if it hasn't been done yet).

> 1) E.g. the unicode property '\p{Nd}' that is equivalent to '\d' matches 
> 590 characters 
> (http://www.fileformat.info/info/unicode/category/Nd/list.htm). There 
> are approx 32 unicode properties, presumably these could also be 
> extended over time as well.
> 2) There are currently 105 unicode blocks, which each block is a 
> discrete range of characters (e.g. \p{InTibetan}: U+0F00–U+0FFF)
> 3) Handling the character class subtraction is also possible, but 
> probably tedious to implement, since it requires the translation to 
> fully understand the set of characters in the character class so it can 
> form an equivalent character class without any subtractions.

But now with the "invert-match" modifier in YANG 1.1, implementations have to be
able to perform such set differences anyway, right?

> These were the three parts of the XML RE that I was hoping to discourage 
> in the YANG author guidelines so that performing a translation is much 
> easier.  Spotting these 3 parts in the regex should be simple, so the 
> translation would still be robust, even if not complete.

I believe that tools intended for general use should follow the YANG spec
literally.

> 
> There are other conversions that may also need to be performed 
> (depending on the target RE engine):
> 1) Character class shorthands (e.g. \d, \w) need to be converted to 
> represent the Unicode set equivalent, since for a lot of engines they 
> only match ASCII characters.  For '\s' it must match ASCII whitespace only.

I think they should mean exactly what XSD spec says they mean.

> 2) If the engine supports greedy alternation (e.g. POSIX basic/extended 
> regex), then alternations need to be converted to an eager form if required.

Yes, and this is a subtle point that could otherwise be easily overlooked.

> 3) The syntax for escaping characters seems to differ in XML RE from 

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-04 Thread Andy Bierman
On Mon, Sep 4, 2017 at 9:22 AM, Robert Wilton  wrote:

>
>
> On 04/09/2017 16:55, Andy Bierman wrote:
>
>
>
> On Mon, Sep 4, 2017 at 7:05 AM, Robert Wilton  wrote:
>
>> Hi Andy,
>>
>> On 02/09/2017 17:46, Andy Bierman wrote:
>>
>>
>>
>> On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder <
>> j.schoenwael...@jacobs-university.de> wrote:
>>
>>> On Sat, Sep 02, 2017 at 10:39:57AM +, Acee Lindem (acee) wrote:
>>> >
>>> > This is not an effort to change or bifurcate the YANG 1.1. It is
>>> simply to
>>> > RECOMMEND a proper subset of XSD pattern that is more portable.
>>> >
>>>
>>> If you implement YANG as it is defined, pattern are portable. Given
>>> this, I do not understand the notion of 'more portable'.
>>>
>>> Anyway, it seems that those who want a more portable subset do not
>>> even agree on what that subset is. Perhaps people pushing for this
>>> should go and write an I-D that explains why a 'more portable' subset
>>> is needed (which problems are we fixing), that defines such a 'more
>>> portable subset', and which includes the reasoning how the subset has
>>> been determined.
>>>
>>>
>>
>> I do not agree that the YANG pattern contains a string that is both a
>> POSIX and XSD regular expression.
>> The RFC is very clear it contains an XSD expression. Pretending it is
>> both is a hack that does not even seem
>> to work 100%, so it is not reliable.
>>
>> I am not suggesting that the YANG pattern is both a POSIX and XSD regular
>> expression.
>>
>> I am only suggesting that the guidelines recommend that authors use a
>> subset of XSD, to make it easier to programmatically *convert* the 'XSD
>> subset compliant regular expression' into a functionally equivalent regular
>> expression for whatever regular expression engine the tooling decides to
>> use.
>>
>>
> Looks like you want the expression to mean the exact same thing in
> multiple expression languages
> and you want to put the burden of this perfect subset on humans who write
> YANG.
>
> Again, no, that is not what I want.
>
> I would like the rules to recommend that authors of standards based YANG
> modules don't use the bits of the XML RE language that (i) they don't use
> anyway, (ii) don't appear to have any compelling use case in standard YANG
> modules, and (iii) are hard to convert to other RE languages.
>
> There recommendations also have the additional advantage that the pattern
> statements that follow these rules are likely to be much easier to
> understand because they use the aspects of regular expressions languages
> that folks are likely to be more familiar with.
>
> This is a really unworkable plan.
>
> Is my proposed 6087bis text really that complicated?
>


Yes -- way too complicated to burden all YANG module writers.
We did a study at Cisco around 2002 to find out why engineers were having
such a hard time learning to write MIB modules.  It turned out that all the
CLRs,
the "helpful" arcane rules on top of standard rules, were causing great
pain.
IMO the IETF should not create a new special variant of the definition in
XSD-TYPES,
which is what 'SHOULD NOT use' guidelines in 6087bis will do.

The pattern-stmt is like YANG XPath -- it is for describing the constraint.
An implementation is not required to use off the shelf tools to enforce the
constraint.
In this case (libxml2) there are widely-available tools that could be
leveraged.
If conversion to a different parser is the desired implementation choice,
then
that should be the tool-makers problem. If a YANG module writer can be told
"convert A to B"
then so can a tool-maker.




> Thanks,
> Rob
>
>
Andy


>
>
>
>
>> E.g. this seems to be the approach used by "libyang" that uses libpcre as
>> the backend RE library rather than libxml.  Unfortunately, I think that the
>> libyang library would currently fail if the pattern statement contained
>> "[[A-Z]-[P-R]]" because it looks like the PCRE2 language does not support
>> character class subtraction.  ACAICT, no standard YANG modules currently
>> support character class subtraction, so the authors of libyang have a
>> choice here:
>>   (i) write a block of code that most likely nobody is going to use, or
>>   (ii) document the limitation, spot character class subtraction in the
>> regex, and flag that it is not supported (or perhaps just ignore it).
>>
>>
>>
>> If the community wants to support both XSD and POSIX expressions, then
>> the proper engineering
>> solution is to introduce a new statement that is defined to contain a
>> POSIX expression.
>> This can be done with a YANG extension now and added to YANG 2.0 later.
>>
>> I think that this is an inferior solution:
>> - there are many languages that YANG tools could be written in: C/C++,
>> Python, Java, Go, Rust, Javascript are all reasonably plausible choices.
>> - they all have similar, but with small differences regular expression
>> flavours (according to http://www.regular-expressions.info/reference.html
>> ).
>> - Personally, I see no inherent advantage of the POS

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-04 Thread Robert Wilton



On 04/09/2017 16:55, Andy Bierman wrote:



On Mon, Sep 4, 2017 at 7:05 AM, Robert Wilton > wrote:


Hi Andy,


On 02/09/2017 17:46, Andy Bierman wrote:



On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder
mailto:j.schoenwael...@jacobs-university.de>> wrote:

On Sat, Sep 02, 2017 at 10:39:57AM +, Acee Lindem (acee)
wrote:
>
> This is not an effort to change or bifurcate the YANG 1.1.
It is simply to
> RECOMMEND a proper subset of XSD pattern that is more portable.
>

If you implement YANG as it is defined, pattern are portable.
Given
this, I do not understand the notion of 'more portable'.

Anyway, it seems that those who want a more portable subset
do not
even agree on what that subset is. Perhaps people pushing for
this
should go and write an I-D that explains why a 'more
portable' subset
is needed (which problems are we fixing), that defines such a
'more
portable subset', and which includes the reasoning how the
subset has
been determined.



I do not agree that the YANG pattern contains a string that is
both a POSIX and XSD regular expression.
The RFC is very clear it contains an XSD expression. Pretending
it is both is a hack that does not even seem
to work 100%, so it is not reliable.

I am not suggesting that the YANG pattern is both a POSIX and XSD
regular expression.

I am only suggesting that the guidelines recommend that authors
use a subset of XSD, to make it easier to programmatically
*convert* the 'XSD subset compliant regular expression' into a
functionally equivalent regular expression for whatever regular
expression engine the tooling decides to use.


Looks like you want the expression to mean the exact same thing in 
multiple expression languages
and you want to put the burden of this perfect subset on humans who 
write YANG.

Again, no, that is not what I want.

I would like the rules to recommend that authors of standards based YANG 
modules don't use the bits of the XML RE language that (i) they don't 
use anyway, (ii) don't appear to have any compelling use case in 
standard YANG modules, and (iii) are hard to convert to other RE languages.


There recommendations also have the additional advantage that the 
pattern statements that follow these rules are likely to be much easier 
to understand because they use the aspects of regular expressions 
languages that folks are likely to be more familiar with.



This is a really unworkable plan.

Is my proposed 6087bis text really that complicated?

Thanks,
Rob





E.g. this seems to be the approach used by "libyang" that uses
libpcre as the backend RE library rather than libxml.
Unfortunately, I think that the libyang library would currently
fail if the pattern statement contained "[[A-Z]-[P-R]]" because it
looks like the PCRE2 language does not support character class
subtraction.  ACAICT, no standard YANG modules currently support
character class subtraction, so the authors of libyang have a
choice here:
  (i) write a block of code that most likely nobody is going to
use, or
  (ii) document the limitation, spot character class subtraction
in the regex, and flag that it is not supported (or perhaps just
ignore it).




If the community wants to support both XSD and POSIX expressions,
then the proper engineering
solution is to introduce a new statement that is defined to
contain a POSIX expression.
This can be done with a YANG extension now and added to YANG 2.0
later.

I think that this is an inferior solution:
- there are many languages that YANG tools could be written in:
C/C++, Python, Java, Go, Rust, Javascript are all reasonably
plausible choices.
- they all have similar, but with small differences regular
expression flavours (according to
http://www.regular-expressions.info/reference.html
).
- Personally, I see no inherent advantage of the POSIX Extended
Regex over XML RE.   In fact, given that it doesn't support
Unicode at all, it would seem to be a somewhat strange choice for
a second pattern statement.
- Nor does it seem pragmatic to introduce lots of different
flavors of pattern statements into YANG each supporting a
different regex syntax.



You seem to be confirming that picking 1 flavor of Posix would be 
impossible.

All the more reason to keep the XSD pattern unburdened.
I see no reason XSD patterns should be constrained because some 
implementors want to

ignore the RFC and pretend the string is some other expression language.

I also don't like the solution that every YANG tool maker has to
either link against libxml2,  or write their own efficient regular
exp

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-04 Thread Robert Wilton

Hi Lada,

On 04/09/2017 15:59, Ladislav Lhotka wrote:

Robert Wilton píše v Po 04. 09. 2017 v 15:05 +0100:

Hi Andy,

On 02/09/2017 17:46, Andy Bierman wrote:

On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder 
 wrote:

On Sat, Sep 02, 2017 at 10:39:57AM +, Acee Lindem (acee) wrote:

This is not an effort to change or bifurcate the YANG 1.1. It is simply to
RECOMMEND a proper subset of XSD pattern that is more portable.


If you implement YANG as it is defined, pattern are portable. Given
this, I do not understand the notion of 'more portable'.

Anyway, it seems that those who want a more portable subset do not
even agree on what that subset is. Perhaps people pushing for this
should go and write an I-D that explains why a 'more portable' subset
is needed (which problems are we fixing), that defines such a 'more
portable subset', and which includes the reasoning how the subset has
been determined.




I do not agree that the YANG pattern contains a string that is both a POSIX and 
XSD regular expression.
The RFC is very clear it contains an XSD expression. Pretending it is both is a 
hack that does not even seem
to work 100%, so it is not reliable.

  I am not suggesting that the YANG pattern is both a POSIX and XSD regular 
expression.

I am only suggesting that the guidelines recommend that authors use a subset of 
XSD, to make it easier to programmatically *convert* the 'XSD subset compliant 
regular expression' into a functionally equivalent regular expression for 
whatever regular expression engine the tooling decides to use.

And that's the point, I think: each developer needs to get a library function so
as to translate the XSD pattern into a native regex of whatever programming
language he/she is currently using. So I guess what we really need is to
identify libraries for common languages that do it correctly - or write simple
translators ourselves if none is available.

Yes, exactly.

Looking at http://www.regular-expressions.info/ then XML RE does look 
like a good standard choice of RE language for YANG pattern statements 
because it is generally one of the most basic RE languages, and hence it 
should be feasible to convert an XML RE into a form usable by most RE 
languages.


But converting some parts of the XML RE syntax would probably be laborious:
1) E.g. the unicode property '\p{Nd}' that is equivalent to '\d' matches 
590 characters 
(http://www.fileformat.info/info/unicode/category/Nd/list.htm). There 
are approx 32 unicode properties, presumably these could also be 
extended over time as well.
2) There are currently 105 unicode blocks, which each block is a 
discrete range of characters (e.g. \p{InTibetan}: U+0F00–U+0FFF)
3) Handling the character class subtraction is also possible, but 
probably tedious to implement, since it requires the translation to 
fully understand the set of characters in the character class so it can 
form an equivalent character class without any subtractions.
These were the three parts of the XML RE that I was hoping to discourage 
in the YANG author guidelines so that performing a translation is much 
easier.  Spotting these 3 parts in the regex should be simple, so the 
translation would still be robust, even if not complete.


There are other conversions that may also need to be performed 
(depending on the target RE engine):
1) Character class shorthands (e.g. \d, \w) need to be converted to 
represent the Unicode set equivalent, since for a lot of engines they 
only match ASCII characters.  For '\s' it must match ASCII whitespace only.
2) If the engine supports greedy alternation (e.g. POSIX basic/extended 
regex), then alternations need to be converted to an eager form if required.
3) The syntax for escaping characters seems to differ in XML RE from 
other common languages.

4) Linebreak match handling seems to differ.
These conversions would need to be done regardless, but would seem to be 
much quicker/simpler to implement than the ones above.


Thanks,
Rob





E.g. this seems to be the approach used by "libyang" that uses libpcre as the backend RE 
library rather than libxml.  Unfortunately, I think that the libyang library would currently fail 
if the pattern statement contained "[[A-Z]-[P-R]]" because it looks like the PCRE2 
language does not support character class subtraction.  ACAICT, no standard YANG modules currently 
support character class subtraction, so the authors of libyang have a choice here:

Note that your example is incorrect, it should be [A-Z-[P-R]]. FWIW, Python
module PyXB (that I used in Yangson library) does support this.

Lada


   (i) write a block of code that most likely nobody is going to use, or
   (ii) document the limitation, spot character class subtraction in the regex, 
and flag that it is not supported (or perhaps just ignore it).



If the community wants to support both XSD and POSIX expressions, then the 
proper engineering
solution is to introduce a new statement that is defined to contain a POSIX 
ex

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-04 Thread Andy Bierman
On Mon, Sep 4, 2017 at 7:05 AM, Robert Wilton  wrote:

> Hi Andy,
>
> On 02/09/2017 17:46, Andy Bierman wrote:
>
>
>
> On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder <
> j.schoenwael...@jacobs-university.de> wrote:
>
>> On Sat, Sep 02, 2017 at 10:39:57AM +, Acee Lindem (acee) wrote:
>> >
>> > This is not an effort to change or bifurcate the YANG 1.1. It is simply
>> to
>> > RECOMMEND a proper subset of XSD pattern that is more portable.
>> >
>>
>> If you implement YANG as it is defined, pattern are portable. Given
>> this, I do not understand the notion of 'more portable'.
>>
>> Anyway, it seems that those who want a more portable subset do not
>> even agree on what that subset is. Perhaps people pushing for this
>> should go and write an I-D that explains why a 'more portable' subset
>> is needed (which problems are we fixing), that defines such a 'more
>> portable subset', and which includes the reasoning how the subset has
>> been determined.
>>
>>
>
> I do not agree that the YANG pattern contains a string that is both a
> POSIX and XSD regular expression.
> The RFC is very clear it contains an XSD expression. Pretending it is both
> is a hack that does not even seem
> to work 100%, so it is not reliable.
>
> I am not suggesting that the YANG pattern is both a POSIX and XSD regular
> expression.
>
> I am only suggesting that the guidelines recommend that authors use a
> subset of XSD, to make it easier to programmatically *convert* the 'XSD
> subset compliant regular expression' into a functionally equivalent regular
> expression for whatever regular expression engine the tooling decides to
> use.
>
>
Looks like you want the expression to mean the exact same thing in multiple
expression languages
and you want to put the burden of this perfect subset on humans who write
YANG.
This is a really unworkable plan.



> E.g. this seems to be the approach used by "libyang" that uses libpcre as
> the backend RE library rather than libxml.  Unfortunately, I think that the
> libyang library would currently fail if the pattern statement contained
> "[[A-Z]-[P-R]]" because it looks like the PCRE2 language does not support
> character class subtraction.  ACAICT, no standard YANG modules currently
> support character class subtraction, so the authors of libyang have a
> choice here:
>   (i) write a block of code that most likely nobody is going to use, or
>   (ii) document the limitation, spot character class subtraction in the
> regex, and flag that it is not supported (or perhaps just ignore it).
>
>
>
> If the community wants to support both XSD and POSIX expressions, then the
> proper engineering
> solution is to introduce a new statement that is defined to contain a
> POSIX expression.
> This can be done with a YANG extension now and added to YANG 2.0 later.
>
> I think that this is an inferior solution:
> - there are many languages that YANG tools could be written in: C/C++,
> Python, Java, Go, Rust, Javascript are all reasonably plausible choices.
> - they all have similar, but with small differences regular expression
> flavours (according to http://www.regular-expressions.info/reference.html
> ).
> - Personally, I see no inherent advantage of the POSIX Extended Regex over
> XML RE.   In fact, given that it doesn't support Unicode at all, it would
> seem to be a somewhat strange choice for a second pattern statement.
> - Nor does it seem pragmatic to introduce lots of different flavors of
> pattern statements into YANG each supporting a different regex syntax.
>
>

You seem to be confirming that picking 1 flavor of Posix would be
impossible.
All the more reason to keep the XSD pattern unburdened.
I see no reason XSD patterns should be constrained because some
implementors want to
ignore the RFC and pretend the string is some other expression language.



> I also don't like the solution that every YANG tool maker has to either
> link against libxml2,  or write their own efficient regular expression
> engine.  I'm not convinced that what the world needs is yet more regular
> expression implementations :-)
>

The write your own tools and don't use libxml2.



> So, I still see that the better technical solution is always only define
> the pattern statements in XML RE language, but to strongly encourage folks
> to use a subset of that language for standards models (which they appear to
> be doing anyway) to make it easier to covert the regular expression into
> compatible versions for other engines.
>
> Thanks,
> Rob
>
>
>

Andy


>
>
>
>> /js
>>
>
> Andy
>
>
>>
>> --
>> Juergen Schoenwaelder   Jacobs University Bremen gGmbH
>> Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
>> Fax:   +49 421 200 3103 
>>
>> ___
>> netmod mailing list
>> netmod@ietf.org
>> https://www.ietf.org/mailman/listinfo/netmod
>>
>
>
>
___
netmod mailing list
netmod@ietf

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-04 Thread Ladislav Lhotka
Robert Wilton píše v Po 04. 09. 2017 v 15:05 +0100:
> Hi Andy,
> 
> On 02/09/2017 17:46, Andy Bierman wrote:
> > 
> > On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder 
> >  wrote:
> > > On Sat, Sep 02, 2017 at 10:39:57AM +, Acee Lindem (acee) wrote:
> > > >
> > > > This is not an effort to change or bifurcate the YANG 1.1. It is simply 
> > > > to
> > > > RECOMMEND a proper subset of XSD pattern that is more portable.
> > > >
> > > 
> > > If you implement YANG as it is defined, pattern are portable. Given
> > > this, I do not understand the notion of 'more portable'.
> > > 
> > > Anyway, it seems that those who want a more portable subset do not
> > > even agree on what that subset is. Perhaps people pushing for this
> > > should go and write an I-D that explains why a 'more portable' subset
> > > is needed (which problems are we fixing), that defines such a 'more
> > > portable subset', and which includes the reasoning how the subset has
> > > been determined.
> > > 
> > > 
> > 
> > 
> > I do not agree that the YANG pattern contains a string that is both a POSIX 
> > and XSD regular expression.
> > The RFC is very clear it contains an XSD expression. Pretending it is both 
> > is a hack that does not even seem
> > to work 100%, so it is not reliable.
>  I am not suggesting that the YANG pattern is both a POSIX and XSD regular 
> expression.
> 
> I am only suggesting that the guidelines recommend that authors use a subset 
> of XSD, to make it easier to programmatically *convert* the 'XSD subset 
> compliant regular expression' into a functionally equivalent regular 
> expression for whatever regular expression engine the tooling decides to use.

And that's the point, I think: each developer needs to get a library function so
as to translate the XSD pattern into a native regex of whatever programming
language he/she is currently using. So I guess what we really need is to
identify libraries for common languages that do it correctly - or write simple
translators ourselves if none is available.

> 
> E.g. this seems to be the approach used by "libyang" that uses libpcre as the 
> backend RE library rather than libxml.  Unfortunately, I think that the 
> libyang library would currently fail if the pattern statement contained 
> "[[A-Z]-[P-R]]" because it looks like the PCRE2 language does not support 
> character class subtraction.  ACAICT, no standard YANG modules currently 
> support character class subtraction, so the authors of libyang have a choice 
> here:

Note that your example is incorrect, it should be [A-Z-[P-R]]. FWIW, Python
module PyXB (that I used in Yangson library) does support this.

Lada

>   (i) write a block of code that most likely nobody is going to use, or 
>   (ii) document the limitation, spot character class subtraction in the 
> regex, and flag that it is not supported (or perhaps just ignore it).
> 
> 
> > If the community wants to support both XSD and POSIX expressions, then the 
> > proper engineering
> > solution is to introduce a new statement that is defined to contain a POSIX 
> > expression.
> > This can be done with a YANG extension now and added to YANG 2.0 later.
>  I think that this is an inferior solution:
> - there are many languages that YANG tools could be written in: C/C++, 
> Python, Java, Go, Rust, Javascript are all reasonably plausible choices.
> - they all have similar, but with small differences regular expression 
> flavours (according to http://www.regular-expressions.info/reference.html).
> - Personally, I see no inherent advantage of the POSIX Extended Regex over 
> XML RE.   In fact, given that it doesn't support Unicode at all, it would 
> seem to be a somewhat strange choice for a second pattern statement.
> - Nor does it seem pragmatic to introduce lots of different flavors of 
> pattern statements into YANG each supporting a different regex syntax.
> 
> I also don't like the solution that every YANG tool maker has to either link 
> against libxml2,  or write their own efficient regular expression engine.  
> I'm not convinced that what the world needs is yet more regular expression 
> implementations :-)
> 
> So, I still see that the better technical solution is always only define the 
> pattern statements in XML RE language, but to strongly encourage folks to use 
> a subset of that language for standards models (which they appear to be doing 
> anyway) to make it easier to covert the regular expression into compatible 
> versions for other engines.
> 
> Thanks,
> Rob
> 
> 
> >  
> > > /js
> > > 
> > 
> > Andy
> >  
> > > --
> > > Juergen Schoenwaelder   Jacobs University Bremen gGmbH
> > > Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
> > > Fax:   +49 421 200 3103 
> > > 
> > > ___
> > > netmod mailing list
> > > netmod@ietf.org
> > > https://www.ietf.org/mailman/listinfo/netmod
> > > 
>  
> ___

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-04 Thread Robert Wilton

Hi Carsten,

I'm slightly lost :-)

Don't you have the same issue for CDDL in that the specification 
supports the full syntax from PCRE (which appears to be one of the much 
larger and more complex regex language specifications) which will force 
implementations to use a PCRE compatible implementation?


I think that the world needs a minimal common regex language ... but 
presumably that is just walking into the XKCD trap: 
https://xkcd.com/927/ ;-)


Rob


On 04/09/2017 12:18, Carsten Bormann wrote:

I’m not going to say we have solved the underlying problem (too many flavors of 
regular expression) completely for CDDL, but in CDDL we are using PCRE with 
anchors then added:

https://tools.ietf.org/html/draft-ietf-cbor-cddl-00#section-3.8.3

(And here is the implementation:
   h[k] = Regexp.new("\\A#{k}\\z")
That should not be too hard to replicate in any language :-)

Grüße, Carsten

.



___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-04 Thread Robert Wilton

Hi Andy,


On 02/09/2017 17:46, Andy Bierman wrote:



On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder 
> wrote:


On Sat, Sep 02, 2017 at 10:39:57AM +, Acee Lindem (acee) wrote:
>
> This is not an effort to change or bifurcate the YANG 1.1. It is
simply to
> RECOMMEND a proper subset of XSD pattern that is more portable.
>

If you implement YANG as it is defined, pattern are portable. Given
this, I do not understand the notion of 'more portable'.

Anyway, it seems that those who want a more portable subset do not
even agree on what that subset is. Perhaps people pushing for this
should go and write an I-D that explains why a 'more portable' subset
is needed (which problems are we fixing), that defines such a 'more
portable subset', and which includes the reasoning how the subset has
been determined.



I do not agree that the YANG pattern contains a string that is both a 
POSIX and XSD regular expression.
The RFC is very clear it contains an XSD expression. Pretending it is 
both is a hack that does not even seem

to work 100%, so it is not reliable.
I am not suggesting that the YANG pattern is both a POSIX and XSD 
regular expression.


I am only suggesting that the guidelines recommend that authors use a 
subset of XSD, to make it easier to programmatically *convert* the 'XSD 
subset compliant regular expression' into a functionally equivalent 
regular expression for whatever regular expression engine the tooling 
decides to use.


E.g. this seems to be the approach used by "libyang" that uses libpcre 
as the backend RE library rather than libxml. Unfortunately, I think 
that the libyang library would currently fail if the pattern statement 
contained "[[A-Z]-[P-R]]" because it looks like the PCRE2 language does 
not support character class subtraction.  ACAICT, no standard YANG 
modules currently support character class subtraction, so the authors of 
libyang have a choice here:

  (i) write a block of code that most likely nobody is going to use, or
  (ii) document the limitation, spot character class subtraction in the 
regex, and flag that it is not supported (or perhaps just ignore it).





If the community wants to support both XSD and POSIX expressions, then 
the proper engineering
solution is to introduce a new statement that is defined to contain a 
POSIX expression.

This can be done with a YANG extension now and added to YANG 2.0 later.

I think that this is an inferior solution:
- there are many languages that YANG tools could be written in: C/C++, 
Python, Java, Go, Rust, Javascript are all reasonably plausible choices.
- they all have similar, but with small differences regular expression 
flavours (according to http://www.regular-expressions.info/reference.html).
- Personally, I see no inherent advantage of the POSIX Extended Regex 
over XML RE.   In fact, given that it doesn't support Unicode at all, it 
would seem to be a somewhat strange choice for a second pattern statement.
- Nor does it seem pragmatic to introduce lots of different flavors of 
pattern statements into YANG each supporting a different regex syntax.


I also don't like the solution that every YANG tool maker has to either 
link against libxml2,  or write their own efficient regular expression 
engine.  I'm not convinced that what the world needs is yet more regular 
expression implementations :-)


So, I still see that the better technical solution is always only define 
the pattern statements in XML RE language, but to strongly encourage 
folks to use a subset of that language for standards models (which they 
appear to be doing anyway) to make it easier to covert the regular 
expression into compatible versions for other engines.


Thanks,
Rob




/js


Andy


--
Juergen Schoenwaelder           Jacobs University Bremen gGmbH
Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103         >

___
netmod mailing list
netmod@ietf.org 
https://www.ietf.org/mailman/listinfo/netmod





___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-04 Thread Carsten Bormann
I’m not going to say we have solved the underlying problem (too many flavors of 
regular expression) completely for CDDL, but in CDDL we are using PCRE with 
anchors then added:

https://tools.ietf.org/html/draft-ietf-cbor-cddl-00#section-3.8.3

(And here is the implementation:
  h[k] = Regexp.new("\\A#{k}\\z")
That should not be too hard to replicate in any language :-)

Grüße, Carsten

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-04 Thread Robert Wilton

Hi Juergen,


On 02/09/2017 08:33, Juergen Schoenwaelder wrote:

On Fri, Sep 01, 2017 at 10:45:51AM +0100, Robert Wilton wrote:

Hi Alex,


On 01/09/2017 00:57, Alex Campbell wrote:

Hi,


I'd be very wary of adding guidelines that restrict the regex syntax.


A tool that supports YANG must implement the full regex language anyway
(or ignore regexes altogether if they are not relevant to the tool's
function).


This is true if the tool is designed to work with any arbitrary YANG
module.  But if a tool only needs to work with a subset of YANG modules
(e.g. perhaps just IETF, OpenConfig, and Vendor models) then they only need
to support the subset of the XML RE language that is used by the YANG
modules that they load.


Rob,

you are on the path to create multiple flavours of YANG, an IETF
flavour, an OC flavour, a Vendor XYZ flavour - in my view this can't
be the goal of a standard.
This is the actually the polar opposite of my aim.  My aim is to try and 
stop the fragmentation that is already happening in the industry, via 
what I perceive is a pragmatic compromise.  I appreciate others

feel differently.

I.e. make XML RE easy enough to use that others in the industry don't 
feel that they need to fork YANG for this reason.  My understanding is 
that for the OpenConfig YANG models authors, the "must use an XML RE 
implementation argument" doesn't seem to be working, in that their 
models are currently defined using POSIX regex pattern statements that 
are certainly incompatible with XML RE (they have anchors).  Although my 
understanding is that they may move to define a separate "posix pattern" 
statement instead, I would guess that they will probably only use the 
"posix pattern" statement in their models and not define both the POSIX 
and XML RE expressions.


So, as it stands today, I think that we will either already end up with 
effective multiple flavours of YANG, or that YANG implementations will 
need to support both XML RE and POSIX RE implementations.  I think that 
is a worse outcome for the wider industry than just encouraging people 
to only use a pragmatic subset of XML RE for network management YANG models.


Rob



A compliant implementation of YANG 1.0 and YANG 1.1 must handle XSD
pattern.

/js



___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-02 Thread Andy Bierman
On Sat, Sep 2, 2017 at 4:28 AM, Juergen Schoenwaelder <
j.schoenwael...@jacobs-university.de> wrote:

> On Sat, Sep 02, 2017 at 10:39:57AM +, Acee Lindem (acee) wrote:
> >
> > This is not an effort to change or bifurcate the YANG 1.1. It is simply
> to
> > RECOMMEND a proper subset of XSD pattern that is more portable.
> >
>
> If you implement YANG as it is defined, pattern are portable. Given
> this, I do not understand the notion of 'more portable'.
>
> Anyway, it seems that those who want a more portable subset do not
> even agree on what that subset is. Perhaps people pushing for this
> should go and write an I-D that explains why a 'more portable' subset
> is needed (which problems are we fixing), that defines such a 'more
> portable subset', and which includes the reasoning how the subset has
> been determined.
>
>

I do not agree that the YANG pattern contains a string that is both a POSIX
and XSD regular expression.
The RFC is very clear it contains an XSD expression. Pretending it is both
is a hack that does not even seem
to work 100%, so it is not reliable.

If the community wants to support both XSD and POSIX expressions, then the
proper engineering
solution is to introduce a new statement that is defined to contain a POSIX
expression.
This can be done with a YANG extension now and added to YANG 2.0 later.



> /js
>

Andy


>
> --
> Juergen Schoenwaelder   Jacobs University Bremen gGmbH
> Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
> Fax:   +49 421 200 3103 
>
> ___
> netmod mailing list
> netmod@ietf.org
> https://www.ietf.org/mailman/listinfo/netmod
>
___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-02 Thread Juergen Schoenwaelder
On Sat, Sep 02, 2017 at 10:39:57AM +, Acee Lindem (acee) wrote:
> 
> This is not an effort to change or bifurcate the YANG 1.1. It is simply to
> RECOMMEND a proper subset of XSD pattern that is more portable.
>

If you implement YANG as it is defined, pattern are portable. Given
this, I do not understand the notion of 'more portable'.

Anyway, it seems that those who want a more portable subset do not
even agree on what that subset is. Perhaps people pushing for this
should go and write an I-D that explains why a 'more portable' subset
is needed (which problems are we fixing), that defines such a 'more
portable subset', and which includes the reasoning how the subset has
been determined.

/js

-- 
Juergen Schoenwaelder   Jacobs University Bremen gGmbH
Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103 

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-02 Thread Acee Lindem (acee)
Juergen, 

On 9/2/17, 3:33 AM, "netmod on behalf of Juergen Schoenwaelder"
 wrote:

>On Fri, Sep 01, 2017 at 10:45:51AM +0100, Robert Wilton wrote:
>> Hi Alex,
>> 
>> 
>> On 01/09/2017 00:57, Alex Campbell wrote:
>> > 
>> > Hi,
>> > 
>> > 
>> > I'd be very wary of adding guidelines that restrict the regex syntax.
>> > 
>> > 
>> > A tool that supports YANG must implement the full regex language
>>anyway
>> > (or ignore regexes altogether if they are not relevant to the tool's
>> > function).
>> > 
>> This is true if the tool is designed to work with any arbitrary YANG
>> module.  But if a tool only needs to work with a subset of YANG modules
>> (e.g. perhaps just IETF, OpenConfig, and Vendor models) then they only
>>need
>> to support the subset of the XML RE language that is used by the YANG
>> modules that they load.
>>
>
>Rob,
>
>you are on the path to create multiple flavours of YANG, an IETF
>flavour, an OC flavour, a Vendor XYZ flavour - in my view this can't
>be the goal of a standard.

This is not an effort to change or bifurcate the YANG 1.1. It is simply to
RECOMMEND a proper subset of XSD pattern that is more portable.

Thanks,
Acee


>
>A compliant implementation of YANG 1.0 and YANG 1.1 must handle XSD
>pattern.
>
>/js
>
>-- 
>Juergen Schoenwaelder   Jacobs University Bremen gGmbH
>Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
>Fax:   +49 421 200 3103 
>
>___
>netmod mailing list
>netmod@ietf.org
>https://www.ietf.org/mailman/listinfo/netmod

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-02 Thread Juergen Schoenwaelder
On Wed, Aug 30, 2017 at 05:44:01PM +0100, Robert Wilton wrote:
> 
> First question: How many pattern statements in draft and standard IETF YANG
> modules actually use Unicode properties (e.g \p{}).
> Answer: Just 2.  To add a zone at the end of the IPv4/IPv6 address.
> 
> E.g.   pattern
>     '(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}'
>   +  '([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])'
>   + '(%[\p{N}\p{L}]+)?';
> 
> This could quite possibly have been written just as
> "\d{1,3}\.{3}\d{1,3)(%\w+)?" and not use Unicode properties at all.

Shorter but less precise. The thread started with a proposal to ban
\d, you seem to like it. Note that \d is not the same as [0-9] in
unicode as far as I know. \d is defined to be \p{Nd} and Nd has way
more than [0-9].

https://www.w3.org/TR/xmlschema-2/#regexs
http://www.fileformat.info/info/unicode/category/Nd/list.htm

Perhaps the usage of \p{N} and \p{L} above is not quite right (I
recall that I tried to find out what exactly the rules for a zone
index are and often you find out that there is not really a precise
definition). My standpoint is that it is the WGs that are responsible
to work out the pattern; the WGs are responsible to decide how strict
they want patterns to be. The pattern in RFC6991 rejects an 'IP
address' of the form 321.1.2.3 or 01.2.3.4 and I think this is
goodness but it is ultimately a decision of the WG producing the YANG
module how the patterns should look like and how strict they are.

And we should separate the discussion of how strict a pattern should
be from the discussion of using unicode constructs or other 'more
recent' constructs in pattern.

/js

-- 
Juergen Schoenwaelder   Jacobs University Bremen gGmbH
Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103 

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-02 Thread Juergen Schoenwaelder
On Fri, Sep 01, 2017 at 10:45:51AM +0100, Robert Wilton wrote:
> Hi Alex,
> 
> 
> On 01/09/2017 00:57, Alex Campbell wrote:
> > 
> > Hi,
> > 
> > 
> > I'd be very wary of adding guidelines that restrict the regex syntax.
> > 
> > 
> > A tool that supports YANG must implement the full regex language anyway
> > (or ignore regexes altogether if they are not relevant to the tool's
> > function).
> > 
> This is true if the tool is designed to work with any arbitrary YANG
> module.  But if a tool only needs to work with a subset of YANG modules
> (e.g. perhaps just IETF, OpenConfig, and Vendor models) then they only need
> to support the subset of the XML RE language that is used by the YANG
> modules that they load.
>

Rob,

you are on the path to create multiple flavours of YANG, an IETF
flavour, an OC flavour, a Vendor XYZ flavour - in my view this can't
be the goal of a standard.

A compliant implementation of YANG 1.0 and YANG 1.1 must handle XSD
pattern.

/js

-- 
Juergen Schoenwaelder   Jacobs University Bremen gGmbH
Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103 

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-09-01 Thread Robert Wilton

Hi Alex,


On 01/09/2017 00:57, Alex Campbell wrote:


Hi,


I'd be very wary of adding guidelines that restrict the regex syntax.


A tool that supports YANG must implement the full regex language 
anyway (or ignore regexes altogether if they are not relevant to the 
tool's function).


This is true if the tool is designed to work with any arbitrary YANG 
module.  But if a tool only needs to work with a subset of YANG modules 
(e.g. perhaps just IETF, OpenConfig, and Vendor models) then they only 
need to support the subset of the XML RE language that is used by the 
YANG modules that they load.


However, these guidelines will inevitably encourage some tool authors 
to take shortcuts.


Yes, that is one of the two aims of these guidelines (the other being 
ease/accuracy of comprehension).


A proper way for a tool policing this to spot the parts of the XML RE 
language that it can't handle, and generating appropriate warnings.  I 
believe that this is easy to do.


But I would rather that all YANG modules use the same regex syntax 
rather than two separate regex syntaxes as is the current status today 
(e.g. OpenConfig modules are using the POSIX extended RE language).




I'm sure there are already tools that take shortcuts - but if the 
shortcuts cause problems, right now it's squarely the tool's fault, 
rather than the model being processed.


If there are official guidelines saying not to use such-and-such regex 
feature, then the tool author can point to the guideline and say "See? 
You aren't supposed to be using that".


These are guidelines, not rules, and all of my proposed guidelines are 
SHOULDs not MUSTs, except that I've mandated not to use '\d' when 
matching [0-9] is what is required.  My guess is that most of pattern 
statements that use \d instead of [0-9] are probably not what the author 
intended.


Can you provide any examples of where the XML RE constructs that I am 
suggesting avoiding are required in any standard network management YANG 
modules?




We may end up with a compatibility nightmare on our hands where 
certain modules are only compatible with certain tools, even more-so 
than is the case now.


The way I see it, adopting these guidelines is likely to improve 
compatibility rather than lessen it, since it encourages authors to 
write YANG models in a way that more tools are likely to cleanly 
interoperate with, and basically happens to be the way that folk are 
writing standards based YANG modules anyway ...


Thanks,
Rob




The only way I'd ever approve of this is if it was a MUST requirement 
in a new version of YANG.





*From:* netmod  on behalf of Robert Wilton 


*Sent:* Thursday, 31 August 2017 4:44 a.m.
*To:* Andy Bierman; Juergen Schoenwaelder; Xufeng Liu; netmod@ietf.org
*Subject:* Re: [netmod] Potential additions to rfc6087bis: RegEx 
guidelines


Hi,

On 30/08/2017 15:52, Andy Bierman wrote:



On Wed, Aug 30, 2017 at 5:31 AM, Juergen Schoenwaelder 
<mailto:j.schoenwael...@jacobs-university.de>> wrote:


On Wed, Aug 30, 2017 at 12:48:19PM +0100, Robert Wilton wrote:
>
>
> On 30/08/2017 11:29, Juergen Schoenwaelder wrote:
> > On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:
> > > Hi Andy,
> > >
> > > What I am suggesting makes it easier for readers, because I
am a proponent
> > > of simpler regular expressions that are easy to read and
understand.
> > >
> > > For example, I wonder how many YANG model readers would
immediately
> > > comprehend what this pattern statement means:
> > >
> > > pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?
> > >
> > > Does allowing such patterns really make it easier for model
readers?
> > This is always difficult to judge but to be fair you have to
show how
> > you express _the same_ (and not a subset) with some other kind of
> > regular expressions. (My understanding is that \p{Sc} is a
currency
> > symbol.)
> Yes, the expression would cover a currency amount, along with
associated
> symbol (e.g. "$200.00").
>
> If I was writing a module, I would probably use the following
pattern
> statement instead, which I think a lot more people would likely
be able to
> comprehend:
>
> pattern "[A-Z]{3}\s?\d+\.\d{2}", using the 3 letter, ISO 4217,
currency codes.  e.g. ("USD 200.00")

But that is not the same. Apples versus oranges. (I expect people to
tell me that (i) currency is irrelevant and (ii) that three ASCII
letter currency acronyms are better than currency symbols anyway but
this is a separate discussion

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-31 Thread Alex Campbell
Hi,


I'd be very wary of adding guidelines that restrict the regex syntax.


A tool that supports YANG must implement the full regex language anyway (or 
ignore regexes altogether if they are not relevant to the tool's function). 
However, these guidelines will inevitably encourage some tool authors to take 
shortcuts.


I'm sure there are already tools that take shortcuts - but if the shortcuts 
cause problems, right now it's squarely the tool's fault, rather than the model 
being processed.

If there are official guidelines saying not to use such-and-such regex feature, 
then the tool author can point to the guideline and say "See? You aren't 
supposed to be using that".


We may end up with a compatibility nightmare on our hands where certain modules 
are only compatible with certain tools, even more-so than is the case now.


The only way I'd ever approve of this is if it was a MUST requirement in a new 
version of YANG.




From: netmod  on behalf of Robert Wilton 

Sent: Thursday, 31 August 2017 4:44 a.m.
To: Andy Bierman; Juergen Schoenwaelder; Xufeng Liu; netmod@ietf.org
Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines


Hi,

On 30/08/2017 15:52, Andy Bierman wrote:


On Wed, Aug 30, 2017 at 5:31 AM, Juergen Schoenwaelder 
mailto:j.schoenwael...@jacobs-university.de>>
 wrote:
On Wed, Aug 30, 2017 at 12:48:19PM +0100, Robert Wilton wrote:
>
>
> On 30/08/2017 11:29, Juergen Schoenwaelder wrote:
> > On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:
> > > Hi Andy,
> > >
> > > What I am suggesting makes it easier for readers, because I am a proponent
> > > of simpler regular expressions that are easy to read and understand.
> > >
> > > For example, I wonder how many YANG model readers would immediately
> > > comprehend what this pattern statement means:
> > >
> > > pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?
> > >
> > > Does allowing such patterns really make it easier for model readers?
> > This is always difficult to judge but to be fair you have to show how
> > you express _the same_ (and not a subset) with some other kind of
> > regular expressions. (My understanding is that \p{Sc} is a currency
> > symbol.)
> Yes, the expression would cover a currency amount, along with associated
> symbol (e.g. "$200.00").
>
> If I was writing a module, I would probably use the following pattern
> statement instead, which I think a lot more people would likely be able to
> comprehend:
>
> pattern "[A-Z]{3}\s?\d+\.\d{2}", using the 3 letter, ISO 4217, currency 
> codes.  e.g. ("USD 200.00")

But that is not the same. Apples versus oranges. (I expect people to
tell me that (i) currency is irrelevant and (ii) that three ASCII
letter currency acronyms are better than currency symbols anyway but
this is a separate discussion I am not interested in.)

> >
> > > The proposes guidelines obviously make it easier (or at least no harder) 
> > > for
> > > tool makers.
> > >
> > > I agree that there is an minor impact to model writers, but really only in
> > > the sense that the guidelines would be telling them not to use the 
> > > esoteric
> > > options of the XML regex syntax that they probably don't know about 
> > > anyway.
> > What is 'esoteric' largely depends on your language environment. What
> > you are saying by 'do not use \p{}' is essentially 'do not use any
> > unicode long live ASCII'.
> No, that is not my intention, i.e. I'm not suggesting banning all use of
> \p{}, but instead limiting it to the character classes that seem like they
> may plausibly be used in standardized YANG modules.

This is entirely subjective. And if you still allow some \p{}, what is
the point of the exercise?

> I'm not trying to change what 6020/7950 defines the pattern statement as,
> just give what I perceive as some pragmatic guidance as to what parts of XML
> RE it makes sense to use in standardized YANG modules, making it easier for
> readers and implementations.
>
> I think that it is fine for companies, vendors, etc to use the full breadth
> of XML RE if they wish.

Implementations have to be prepared to handle XSD pattern if they
claim compliance to YANG 1.0 and 1.1. So all this only helps
non-compliant implementations. This may indeed be a goal - but then we
should spell this out as such - this helps non-compliant
implementations (and they may still fail on the first \p{} that
you still allow).

If implementations do not implement the YANG pattern statement but
something else, then then they should ignore patterns they can&

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-31 Thread Robert Wilton

Hi Lou, all

Proposed 6087bis text inline below.

On 30/08/2017 21:28, Lou Berger wrote:


Rob

Speaking as a contributor.


On August 30, 2017 12:44:42 PM Robert Wilton  wrote:

> Hi,
>
> On 30/08/2017 15:52, Andy Bierman wrote:
>>
>>
>> On Wed, Aug 30, 2017 at 5:31 AM, Juergen Schoenwaelder
>> > > wrote:
>>
>> On Wed, Aug 30, 2017 at 12:48:19PM +0100, Robert Wilton wrote:
>> >
>> >
>> > On 30/08/2017 11:29, Juergen Schoenwaelder wrote:
>> > > On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:
>> > > > Hi Andy,
>> > > >
>> > > > What I am suggesting makes it easier for readers, because I
>> am a proponent
>> > > > of simpler regular expressions that are easy to read and
>> understand.
>> > > >
>> > > > For example, I wonder how many YANG model readers would
>> immediately
>> > > > comprehend what this pattern statement means:
>> > > >
>> > > > pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?
>> > > >
>> > > > Does allowing such patterns really make it easier for model
>> readers?
>> > > This is always difficult to judge but to be fair you have to
>> show how
>> > > you express _the same_ (and not a subset) with some other 
kind of

>> > > regular expressions. (My understanding is that \p{Sc} is a
>> currency
>> > > symbol.)
>> > Yes, the expression would cover a currency amount, along with
>> associated
>> > symbol (e.g. "$200.00").
>> >
>> > If I was writing a module, I would probably use the following
>> pattern
>> > statement instead, which I think a lot more people would likely
>> be able to
>> > comprehend:
>> >
>> > pattern "[A-Z]{3}\s?\d+\.\d{2}", using the 3 letter, ISO 4217,
>> currency codes.  e.g. ("USD 200.00")
>>
>> But that is not the same. Apples versus oranges. (I expect 
people to

>> tell me that (i) currency is irrelevant and (ii) that three ASCII
>> letter currency acronyms are better than currency symbols 
anyway but

>> this is a separate discussion I am not interested in.)
>>
>> > >
>> > > > The proposes guidelines obviously make it easier (or at
>> least no harder) for
>> > > > tool makers.
>> > > >
>> > > > I agree that there is an minor impact to model writers, but
>> really only in
>> > > > the sense that the guidelines would be telling them not to
>> use the esoteric
>> > > > options of the XML regex syntax that they probably don't
>> know about anyway.
>> > > What is 'esoteric' largely depends on your language
>> environment. What
>> > > you are saying by 'do not use \p{}' is essentially 'do not 
use any

>> > > unicode long live ASCII'.
>> > No, that is not my intention, i.e. I'm not suggesting banning
>> all use of
>> > \p{}, but instead limiting it to the character classes that seem
>> like they
>> > may plausibly be used in standardized YANG modules.
>>
>> This is entirely subjective. And if you still allow some \p{}, 
what is

>> the point of the exercise?
>>
>> > I'm not trying to change what 6020/7950 defines the pattern
>> statement as,
>> > just give what I perceive as some pragmatic guidance as to what
>> parts of XML
>> > RE it makes sense to use in standardized YANG modules, making it
>> easier for
>> > readers and implementations.
>> >
>> > I think that it is fine for companies, vendors, etc to use the
>> full breadth
>> > of XML RE if they wish.
>>
>> Implementations have to be prepared to handle XSD pattern if they
>> claim compliance to YANG 1.0 and 1.1. So all this only helps
>> non-compliant implementations. This may indeed be a goal - but 
then we

>> should spell this out as such - this helps non-compliant
>> implementations (and they may still fail on the first \p{} that
>> you still allow).
>>
>> If implementations do not implement the YANG pattern statement but
>> something else, then then they should ignore patterns they can't
>> understand and treat the pattern as if it would have been in a
>> description clause - i.e., leave it to humans to write the code 
that
>> implements the pattern correctly. Note that YANG does not say 
anything

>> how stuff is implemented.
>>
>>
>>
>> This does not work.
>> There are 3 outcomes from the regex compiler
>>
>> 1) proper syntax was used and accepted; pattern matches correctly
>> 2) improper syntax was used and accepted; pattern matches incorrectly
>> 3) improper syntax was used and rejected; compiler error generated
>>
>> Case (2) is the really bad one and we have seen in in bug reports.
>>
>> This issue was discussed in detail for almost 2 years and the
>> conclusion was
>> that a YANG extension would be used to specify other pattern types than
>> the XSD pattern mandated by the standard.
> I actually think that XML RE i

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-30 Thread Andy Bierman
Hi,

The burden this would place on YANG writers would be excessive.
We learned in SNMP-land about CLRs (clever little rules) and how they need
to be avoided. We learned that special-casing and sub-setting technology has
its own costs, which are usually more than the problem they solved
(e.g., counter names MUST be in the plural form).


Andy



On Wed, Aug 30, 2017 at 1:03 PM, Kent Watsen  wrote:

>
>
> As Andy says, readability is #1, and it follows that a restricted subset
> would be more understandable.  Standardizing this would require an update
> to RFC 7950 (read: not going to happen anytime soon).  Maybe we could start
> with just having a tool detect when something outside the common-subset is
> used.   Can a "common subset" be well-defined?  - "common" between how many
> engines? - would it be forever evolving?
>
>
>
> K. // contributor
>
>
>
>
>
> On 8/30/17, 12:44 PM, "netmod on behalf of Robert Wilton" <
> netmod-boun...@ietf.org on behalf of rwil...@cisco.com> wrote:
>
>
>
> I actually think that XML RE is a good choice for YANG pattern statements
> (because it is one of the more simple RE languages), I just don't think
> that we need all of it.
>
>
> First question: How many pattern statements in draft and standard IETF
> YANG modules actually use Unicode properties (e.g \p{}).
> Answer: Just 2.  To add a zone at the end of the IPv4/IPv6 address.
>
> E.g.   pattern
> '(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}'
>   +  '([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])'
>   + '(%[\p{N}\p{L}]+)?';
>
> This could quite possibly have been written just as
> "\d{1,3}\.{3}\d{1,3)(%\w+)?" and not use Unicode properties at all.
>
> There a couple more occurrences of Unicode character classes in the vendor
> models on github, but only to restrict them to the ASCII character set (oh
> the irony), which I believe can be accomplished without resorting to
> Unicode properties.
>
>
> Another question: How often is character class subtraction (e.g.
> [A-Z-[PQ]] used in standard & the github YANG modules?
> Answer: 0.  AFAICT, it isn't used at all, anywhere ...
>
>
>
> Now, I'm not proposing using a different regex syntax for pattern
> statements, just a sensible subset of XSD RE, such that it easier for folks
> to read/review pattern statements, and it is easier for client and server
> implementations to translate into other common regex implementations if
> they so wish.
>
> Of course, as part of that translation, I would expect a translation
> function to check and generate an error if the translation cannot handle
> the input regex (e.g. if it uses an obscure unmatched unicode property or a
> unicode block, or character class subtraction syntax).  This really doesn't
> seem hard to me.
>
> But the XML RE language has stuff in it that I don't think anyone is ever
> going to use in a standardized network management YANG model.   Forcing
> everyone to implement support for this stuff just seems like a complete
> waste of time and effort.  Looking at the regex info website it looks like
> there are about 143 unicode properties and blocks defined (it may be
> incomplete), or which I think that 135+ of these probably have no relevance
> in network management YANG modules, and the benefit of the remaining ones
> is pretty suspect.
>
> I mean, how many network management YANG modules really need a pattern
> statement that only matches Runic characters?  Perhaps someone out there is
> busy defining "middle-earth.yang" ;-)
>
> If I am the only person opposed to making life unnecessarily difficult to
> readers of YANG models, and client/server tool implementors interacting
> with YANG then it is probably time to give up this discussion. ;-)
>
> Python, quite likely a common tool for client side network management,
> also doesn't seem to have any support of unicode properties or blocks.
> Perhaps implementations will hook it up to libxml2 instead, or write a full
> translation XML RE to Python RE conversion tool.  But probably most people
> will just feed the pattern statement into the native Python regex engine,
> and my guess is that this will probably work 95% of the time.  The other 5%
> ... who knows what will happen ... oh well, better to try and fail than to
> not try at all.
>
> Apologies if this email comes across as a rant.
>
> Rob
>
>
>
>
___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-30 Thread Lou Berger

Rob

Speaking as a contributor.


On August 30, 2017 12:44:42 PM Robert Wilton  wrote:


Hi,

On 30/08/2017 15:52, Andy Bierman wrote:



On Wed, Aug 30, 2017 at 5:31 AM, Juergen Schoenwaelder
mailto:j.schoenwael...@jacobs-university.de>> wrote:

On Wed, Aug 30, 2017 at 12:48:19PM +0100, Robert Wilton wrote:
>
>
> On 30/08/2017 11:29, Juergen Schoenwaelder wrote:
> > On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:
> > > Hi Andy,
> > >
> > > What I am suggesting makes it easier for readers, because I
am a proponent
> > > of simpler regular expressions that are easy to read and
understand.
> > >
> > > For example, I wonder how many YANG model readers would
immediately
> > > comprehend what this pattern statement means:
> > >
> > > pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?
> > >
> > > Does allowing such patterns really make it easier for model
readers?
> > This is always difficult to judge but to be fair you have to
show how
> > you express _the same_ (and not a subset) with some other kind of
> > regular expressions. (My understanding is that \p{Sc} is a
currency
> > symbol.)
> Yes, the expression would cover a currency amount, along with
associated
> symbol (e.g. "$200.00").
>
> If I was writing a module, I would probably use the following
pattern
> statement instead, which I think a lot more people would likely
be able to
> comprehend:
>
> pattern "[A-Z]{3}\s?\d+\.\d{2}", using the 3 letter, ISO 4217,
currency codes.  e.g. ("USD 200.00")

But that is not the same. Apples versus oranges. (I expect people to
tell me that (i) currency is irrelevant and (ii) that three ASCII
letter currency acronyms are better than currency symbols anyway but
this is a separate discussion I am not interested in.)

> >
> > > The proposes guidelines obviously make it easier (or at
least no harder) for
> > > tool makers.
> > >
> > > I agree that there is an minor impact to model writers, but
really only in
> > > the sense that the guidelines would be telling them not to
use the esoteric
> > > options of the XML regex syntax that they probably don't
know about anyway.
> > What is 'esoteric' largely depends on your language
environment. What
> > you are saying by 'do not use \p{}' is essentially 'do not use any
> > unicode long live ASCII'.
> No, that is not my intention, i.e. I'm not suggesting banning
all use of
> \p{}, but instead limiting it to the character classes that seem
like they
> may plausibly be used in standardized YANG modules.

This is entirely subjective. And if you still allow some \p{}, what is
the point of the exercise?

> I'm not trying to change what 6020/7950 defines the pattern
statement as,
> just give what I perceive as some pragmatic guidance as to what
parts of XML
> RE it makes sense to use in standardized YANG modules, making it
easier for
> readers and implementations.
>
> I think that it is fine for companies, vendors, etc to use the
full breadth
> of XML RE if they wish.

Implementations have to be prepared to handle XSD pattern if they
claim compliance to YANG 1.0 and 1.1. So all this only helps
non-compliant implementations. This may indeed be a goal - but then we
should spell this out as such - this helps non-compliant
implementations (and they may still fail on the first \p{} that
you still allow).

If implementations do not implement the YANG pattern statement but
something else, then then they should ignore patterns they can't
understand and treat the pattern as if it would have been in a
description clause - i.e., leave it to humans to write the code that
implements the pattern correctly. Note that YANG does not say anything
how stuff is implemented.



This does not work.
There are 3 outcomes from the regex compiler

1) proper syntax was used and accepted; pattern matches correctly
2) improper syntax was used and accepted; pattern matches incorrectly
3) improper syntax was used and rejected; compiler error generated

Case (2) is the really bad one and we have seen in in bug reports.

This issue was discussed in detail for almost 2 years and the
conclusion was
that a YANG extension would be used to specify other pattern types than
the XSD pattern mandated by the standard.

I actually think that XML RE is a good choice for YANG pattern
statements (because it is one of the more simple RE languages), I just
don't think that we need all of it.


First question: How many pattern statements in draft and standard IETF
YANG modules actually use Unicode properties (e.g \p{}).
Answer: Just 2.  To add a zone at the end of the IPv4/IPv6 address.

E.g.   pattern
     '(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}'
   +  '([

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-30 Thread Kent Watsen

As Andy says, readability is #1, and it follows that a restricted subset would 
be more understandable.  Standardizing this would require an update to RFC 7950 
(read: not going to happen anytime soon).  Maybe we could start with just 
having a tool detect when something outside the common-subset is used.   Can a 
"common subset" be well-defined?  - "common" between how many engines? - would 
it be forever evolving?

K. // contributor


On 8/30/17, 12:44 PM, "netmod on behalf of Robert Wilton" 
mailto:netmod-boun...@ietf.org> on behalf of 
rwil...@cisco.com> wrote:

I actually think that XML RE is a good choice for YANG pattern statements 
(because it is one of the more simple RE languages), I just don't think that we 
need all of it.


First question: How many pattern statements in draft and standard IETF YANG 
modules actually use Unicode properties (e.g \p{}).
Answer: Just 2.  To add a zone at the end of the IPv4/IPv6 address.

E.g.   pattern
'(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}'
  +  '([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])'
  + '(%[\p{N}\p{L}]+)?';

This could quite possibly have been written just as 
"\d{1,3}\.{3}\d{1,3)(%\w+)?" and not use Unicode properties at all.

There a couple more occurrences of Unicode character classes in the vendor 
models on github, but only to restrict them to the ASCII character set (oh the 
irony), which I believe can be accomplished without resorting to Unicode 
properties.


Another question: How often is character class subtraction (e.g. [A-Z-[PQ]] 
used in standard & the github YANG modules?
Answer: 0.  AFAICT, it isn't used at all, anywhere ...



Now, I'm not proposing using a different regex syntax for pattern statements, 
just a sensible subset of XSD RE, such that it easier for folks to read/review 
pattern statements, and it is easier for client and server implementations to 
translate into other common regex implementations if they so wish.

Of course, as part of that translation, I would expect a translation function 
to check and generate an error if the translation cannot handle the input regex 
(e.g. if it uses an obscure unmatched unicode property or a unicode block, or 
character class subtraction syntax).  This really doesn't seem hard to me.

But the XML RE language has stuff in it that I don't think anyone is ever going 
to use in a standardized network management YANG model.   Forcing everyone to 
implement support for this stuff just seems like a complete waste of time and 
effort.  Looking at the regex info website it looks like there are about 143 
unicode properties and blocks defined (it may be incomplete), or which I think 
that 135+ of these probably have no relevance in network management YANG 
modules, and the benefit of the remaining ones is pretty suspect.

I mean, how many network management YANG modules really need a pattern 
statement that only matches Runic characters?  Perhaps someone out there is 
busy defining "middle-earth.yang" ;-)

If I am the only person opposed to making life unnecessarily difficult to 
readers of YANG models, and client/server tool implementors interacting with 
YANG then it is probably time to give up this discussion. ;-)

Python, quite likely a common tool for client side network management, also 
doesn't seem to have any support of unicode properties or blocks.  Perhaps 
implementations will hook it up to libxml2 instead, or write a full translation 
XML RE to Python RE conversion tool.  But probably most people will just feed 
the pattern statement into the native Python regex engine, and my guess is that 
this will probably work 95% of the time.  The other 5% ... who knows what will 
happen ... oh well, better to try and fail than to not try at all.

Apologies if this email comes across as a rant.

Rob



___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-30 Thread Robert Wilton

Hi,

On 30/08/2017 15:52, Andy Bierman wrote:



On Wed, Aug 30, 2017 at 5:31 AM, Juergen Schoenwaelder 
> wrote:


On Wed, Aug 30, 2017 at 12:48:19PM +0100, Robert Wilton wrote:
>
>
> On 30/08/2017 11:29, Juergen Schoenwaelder wrote:
> > On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:
> > > Hi Andy,
> > >
> > > What I am suggesting makes it easier for readers, because I
am a proponent
> > > of simpler regular expressions that are easy to read and
understand.
> > >
> > > For example, I wonder how many YANG model readers would
immediately
> > > comprehend what this pattern statement means:
> > >
> > > pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?
> > >
> > > Does allowing such patterns really make it easier for model
readers?
> > This is always difficult to judge but to be fair you have to
show how
> > you express _the same_ (and not a subset) with some other kind of
> > regular expressions. (My understanding is that \p{Sc} is a
currency
> > symbol.)
> Yes, the expression would cover a currency amount, along with
associated
> symbol (e.g. "$200.00").
>
> If I was writing a module, I would probably use the following
pattern
> statement instead, which I think a lot more people would likely
be able to
> comprehend:
>
> pattern "[A-Z]{3}\s?\d+\.\d{2}", using the 3 letter, ISO 4217,
currency codes.  e.g. ("USD 200.00")

But that is not the same. Apples versus oranges. (I expect people to
tell me that (i) currency is irrelevant and (ii) that three ASCII
letter currency acronyms are better than currency symbols anyway but
this is a separate discussion I am not interested in.)

> >
> > > The proposes guidelines obviously make it easier (or at
least no harder) for
> > > tool makers.
> > >
> > > I agree that there is an minor impact to model writers, but
really only in
> > > the sense that the guidelines would be telling them not to
use the esoteric
> > > options of the XML regex syntax that they probably don't
know about anyway.
> > What is 'esoteric' largely depends on your language
environment. What
> > you are saying by 'do not use \p{}' is essentially 'do not use any
> > unicode long live ASCII'.
> No, that is not my intention, i.e. I'm not suggesting banning
all use of
> \p{}, but instead limiting it to the character classes that seem
like they
> may plausibly be used in standardized YANG modules.

This is entirely subjective. And if you still allow some \p{}, what is
the point of the exercise?

> I'm not trying to change what 6020/7950 defines the pattern
statement as,
> just give what I perceive as some pragmatic guidance as to what
parts of XML
> RE it makes sense to use in standardized YANG modules, making it
easier for
> readers and implementations.
>
> I think that it is fine for companies, vendors, etc to use the
full breadth
> of XML RE if they wish.

Implementations have to be prepared to handle XSD pattern if they
claim compliance to YANG 1.0 and 1.1. So all this only helps
non-compliant implementations. This may indeed be a goal - but then we
should spell this out as such - this helps non-compliant
implementations (and they may still fail on the first \p{} that
you still allow).

If implementations do not implement the YANG pattern statement but
something else, then then they should ignore patterns they can't
understand and treat the pattern as if it would have been in a
description clause - i.e., leave it to humans to write the code that
implements the pattern correctly. Note that YANG does not say anything
how stuff is implemented.



This does not work.
There are 3 outcomes from the regex compiler

1) proper syntax was used and accepted; pattern matches correctly
2) improper syntax was used and accepted; pattern matches incorrectly
3) improper syntax was used and rejected; compiler error generated

Case (2) is the really bad one and we have seen in in bug reports.

This issue was discussed in detail for almost 2 years and the 
conclusion was

that a YANG extension would be used to specify other pattern types than
the XSD pattern mandated by the standard.
I actually think that XML RE is a good choice for YANG pattern 
statements (because it is one of the more simple RE languages), I just 
don't think that we need all of it.



First question: How many pattern statements in draft and standard IETF 
YANG modules actually use Unicode properties (e.g \p{}).

Answer: Just 2.  To add a zone at the end of the IPv4/IPv6 address.

E.g.   pattern
    '(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}'
  +  '([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])'
  + '(%[\p{N}\p{L}]+)?';

Th

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-30 Thread Andy Bierman
On Wed, Aug 30, 2017 at 5:31 AM, Juergen Schoenwaelder <
j.schoenwael...@jacobs-university.de> wrote:

> On Wed, Aug 30, 2017 at 12:48:19PM +0100, Robert Wilton wrote:
> >
> >
> > On 30/08/2017 11:29, Juergen Schoenwaelder wrote:
> > > On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:
> > > > Hi Andy,
> > > >
> > > > What I am suggesting makes it easier for readers, because I am a
> proponent
> > > > of simpler regular expressions that are easy to read and understand.
> > > >
> > > > For example, I wonder how many YANG model readers would immediately
> > > > comprehend what this pattern statement means:
> > > >
> > > > pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?
> > > >
> > > > Does allowing such patterns really make it easier for model readers?
> > > This is always difficult to judge but to be fair you have to show how
> > > you express _the same_ (and not a subset) with some other kind of
> > > regular expressions. (My understanding is that \p{Sc} is a currency
> > > symbol.)
> > Yes, the expression would cover a currency amount, along with associated
> > symbol (e.g. "$200.00").
> >
> > If I was writing a module, I would probably use the following pattern
> > statement instead, which I think a lot more people would likely be able
> to
> > comprehend:
> >
> > pattern "[A-Z]{3}\s?\d+\.\d{2}", using the 3 letter, ISO 4217, currency
> codes.  e.g. ("USD 200.00")
>
> But that is not the same. Apples versus oranges. (I expect people to
> tell me that (i) currency is irrelevant and (ii) that three ASCII
> letter currency acronyms are better than currency symbols anyway but
> this is a separate discussion I am not interested in.)
>
> > >
> > > > The proposes guidelines obviously make it easier (or at least no
> harder) for
> > > > tool makers.
> > > >
> > > > I agree that there is an minor impact to model writers, but really
> only in
> > > > the sense that the guidelines would be telling them not to use the
> esoteric
> > > > options of the XML regex syntax that they probably don't know about
> anyway.
> > > What is 'esoteric' largely depends on your language environment. What
> > > you are saying by 'do not use \p{}' is essentially 'do not use any
> > > unicode long live ASCII'.
> > No, that is not my intention, i.e. I'm not suggesting banning all use of
> > \p{}, but instead limiting it to the character classes that seem like
> they
> > may plausibly be used in standardized YANG modules.
>
> This is entirely subjective. And if you still allow some \p{}, what is
> the point of the exercise?
>
> > I'm not trying to change what 6020/7950 defines the pattern statement as,
> > just give what I perceive as some pragmatic guidance as to what parts of
> XML
> > RE it makes sense to use in standardized YANG modules, making it easier
> for
> > readers and implementations.
> >
> > I think that it is fine for companies, vendors, etc to use the full
> breadth
> > of XML RE if they wish.
>
> Implementations have to be prepared to handle XSD pattern if they
> claim compliance to YANG 1.0 and 1.1. So all this only helps
> non-compliant implementations. This may indeed be a goal - but then we
> should spell this out as such - this helps non-compliant
> implementations (and they may still fail on the first \p{} that
> you still allow).
>
> If implementations do not implement the YANG pattern statement but
> something else, then then they should ignore patterns they can't
> understand and treat the pattern as if it would have been in a
> description clause - i.e., leave it to humans to write the code that
> implements the pattern correctly. Note that YANG does not say anything
> how stuff is implemented.
>


This does not work.
There are 3 outcomes from the regex compiler

1) proper syntax was used and accepted; pattern matches correctly
2) improper syntax was used and accepted; pattern matches incorrectly
3) improper syntax was used and rejected; compiler error generated

Case (2) is the really bad one and we have seen in in bug reports.

This issue was discussed in detail for almost 2 years and the conclusion was
that a YANG extension would be used to specify other pattern types than
the XSD pattern mandated by the standard.


> /js
>
>
Andy


> --
> Juergen Schoenwaelder   Jacobs University Bremen gGmbH
> Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
> Fax:   +49 421 200 3103 
>
___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-30 Thread Juergen Schoenwaelder
On Wed, Aug 30, 2017 at 12:48:19PM +0100, Robert Wilton wrote:
> 
> 
> On 30/08/2017 11:29, Juergen Schoenwaelder wrote:
> > On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:
> > > Hi Andy,
> > > 
> > > What I am suggesting makes it easier for readers, because I am a proponent
> > > of simpler regular expressions that are easy to read and understand.
> > > 
> > > For example, I wonder how many YANG model readers would immediately
> > > comprehend what this pattern statement means:
> > > 
> > > pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?
> > > 
> > > Does allowing such patterns really make it easier for model readers?
> > This is always difficult to judge but to be fair you have to show how
> > you express _the same_ (and not a subset) with some other kind of
> > regular expressions. (My understanding is that \p{Sc} is a currency
> > symbol.)
> Yes, the expression would cover a currency amount, along with associated
> symbol (e.g. "$200.00").
> 
> If I was writing a module, I would probably use the following pattern
> statement instead, which I think a lot more people would likely be able to
> comprehend:
> 
> pattern "[A-Z]{3}\s?\d+\.\d{2}", using the 3 letter, ISO 4217, currency 
> codes.  e.g. ("USD 200.00")

But that is not the same. Apples versus oranges. (I expect people to
tell me that (i) currency is irrelevant and (ii) that three ASCII
letter currency acronyms are better than currency symbols anyway but
this is a separate discussion I am not interested in.)

> > 
> > > The proposes guidelines obviously make it easier (or at least no harder) 
> > > for
> > > tool makers.
> > > 
> > > I agree that there is an minor impact to model writers, but really only in
> > > the sense that the guidelines would be telling them not to use the 
> > > esoteric
> > > options of the XML regex syntax that they probably don't know about 
> > > anyway.
> > What is 'esoteric' largely depends on your language environment. What
> > you are saying by 'do not use \p{}' is essentially 'do not use any
> > unicode long live ASCII'.
> No, that is not my intention, i.e. I'm not suggesting banning all use of
> \p{}, but instead limiting it to the character classes that seem like they
> may plausibly be used in standardized YANG modules.

This is entirely subjective. And if you still allow some \p{}, what is
the point of the exercise?

> I'm not trying to change what 6020/7950 defines the pattern statement as,
> just give what I perceive as some pragmatic guidance as to what parts of XML
> RE it makes sense to use in standardized YANG modules, making it easier for
> readers and implementations.
> 
> I think that it is fine for companies, vendors, etc to use the full breadth
> of XML RE if they wish.

Implementations have to be prepared to handle XSD pattern if they
claim compliance to YANG 1.0 and 1.1. So all this only helps
non-compliant implementations. This may indeed be a goal - but then we
should spell this out as such - this helps non-compliant
implementations (and they may still fail on the first \p{} that
you still allow).

If implementations do not implement the YANG pattern statement but
something else, then then they should ignore patterns they can't
understand and treat the pattern as if it would have been in a
description clause - i.e., leave it to humans to write the code that
implements the pattern correctly. Note that YANG does not say anything
how stuff is implemented.

/js

-- 
Juergen Schoenwaelder   Jacobs University Bremen gGmbH
Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103 

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-30 Thread Robert Wilton



On 30/08/2017 11:29, Juergen Schoenwaelder wrote:

On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:

Hi Andy,

What I am suggesting makes it easier for readers, because I am a proponent
of simpler regular expressions that are easy to read and understand.

For example, I wonder how many YANG model readers would immediately
comprehend what this pattern statement means:

pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?

Does allowing such patterns really make it easier for model readers?

This is always difficult to judge but to be fair you have to show how
you express _the same_ (and not a subset) with some other kind of
regular expressions. (My understanding is that \p{Sc} is a currency
symbol.)
Yes, the expression would cover a currency amount, along with associated 
symbol (e.g. "$200.00").


If I was writing a module, I would probably use the following pattern 
statement instead, which I think a lot more people would likely be able 
to comprehend:


pattern "[A-Z]{3}\s?\d+\.\d{2}", using the 3 letter, ISO 4217, currency codes.  e.g. 
("USD 200.00")






The proposes guidelines obviously make it easier (or at least no harder) for
tool makers.

I agree that there is an minor impact to model writers, but really only in
the sense that the guidelines would be telling them not to use the esoteric
options of the XML regex syntax that they probably don't know about anyway.

What is 'esoteric' largely depends on your language environment. What
you are saying by 'do not use \p{}' is essentially 'do not use any
unicode long live ASCII'.
No, that is not my intention, i.e. I'm not suggesting banning all use of 
\p{}, but instead limiting it to the character classes that seem like 
they may plausibly be used in standardized YANG modules.


I'm not trying to change what 6020/7950 defines the pattern statement 
as, just give what I perceive as some pragmatic guidance as to what 
parts of XML RE it makes sense to use in standardized YANG modules, 
making it easier for readers and implementations.


I think that it is fine for companies, vendors, etc to use the full 
breadth of XML RE if they wish.


Thanks,
Rob

  
/js




___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-30 Thread Juergen Schoenwaelder
On Wed, Aug 30, 2017 at 10:16:30AM +0100, Robert Wilton wrote:
> Hi Andy,
> 
> What I am suggesting makes it easier for readers, because I am a proponent
> of simpler regular expressions that are easy to read and understand.
> 
> For example, I wonder how many YANG model readers would immediately
> comprehend what this pattern statement means:
> 
> pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?
> 
> Does allowing such patterns really make it easier for model readers?

This is always difficult to judge but to be fair you have to show how
you express _the same_ (and not a subset) with some other kind of
regular expressions. (My understanding is that \p{Sc} is a currency
symbol.)

> The proposes guidelines obviously make it easier (or at least no harder) for
> tool makers.
> 
> I agree that there is an minor impact to model writers, but really only in
> the sense that the guidelines would be telling them not to use the esoteric
> options of the XML regex syntax that they probably don't know about anyway.

What is 'esoteric' largely depends on your language environment. What
you are saying by 'do not use \p{}' is essentially 'do not use any
unicode long live ASCII'.
 
/js

-- 
Juergen Schoenwaelder   Jacobs University Bremen gGmbH
Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103 

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-30 Thread Robert Wilton

Hi Andy,

What I am suggesting makes it easier for readers, because I am a 
proponent of simpler regular expressions that are easy to read and 
understand.


For example, I wonder how many YANG model readers would immediately 
comprehend what this pattern statement means:


pattern "\p{Sc}\p{Zs}?\p{Nd}+\.\p{Nd}{2}"?

Does allowing such patterns really make it easier for model readers?

The proposes guidelines obviously make it easier (or at least no harder) 
for tool makers.


I agree that there is an minor impact to model writers, but really only 
in the sense that the guidelines would be telling them not to use the 
esoteric options of the XML regex syntax that they probably don't know 
about anyway.


If explicitly putting this in the YANG author guidelines is not liked, 
then another possible option could be a softer recommendation in the 
guidelines RFC, with some more explicit examples of stuff to avoid on an 
YANG FAQ Wiki page.


Thanks,
Rob


On 29/08/2017 17:15, Andy Bierman wrote:

Hi,

I agree with Juergen that these proposed guidelines are not a good idea.
The priority order for YANG is (1) readers (2) writers and (3) toolmakers.
It seems trivial for group (3) to convert the XSD pattern to some 
other format.
It seems difficult to train all the people in groups (1) and (2) that 
there are lots of

special new rules to learn.


Andy


On Tue, Aug 29, 2017 at 7:27 AM, Robert Wilton > wrote:



On 28/08/2017 16:46, Juergen Schoenwaelder wrote:

On Mon, Aug 28, 2017 at 12:58:59PM +, Xufeng Liu wrote:

[Xufeng] [0..9] is still compliant with the XSD pattern specified by
YANG 1.0 and 1.1. Using [0..9] instead of [\d] will make the
implementations with native POSIX RegEx easier without the need for
a tool to inspect every element of the RegEx pattern.

Yes, but then \d is legal in YANG (and it is used in a couple of
published RFCs).

I entirely agree that YANG regular expressions must be legal XML
Schema regular expressions.

However, I don't think that the majority of YANG implementations
are going to want to either use libxml or write their own
implementation of the XML RE language.  Instead it is desirable
that they can use whatever standard regex implementation comes
with their language, or is readily available in a library.

Most of the pattern statements I see in YANG modules use a basic
subset of regular expressions, and hence it looks like they can
often be used by most RE engines, perhaps with some trivial tweaks
or conversions.  However, there is no formal guidance recommending
that pattern statements in standard modules are restricted to a
subset of XML RE.

Hence, ideally I would like 6087bis to state that pattern
statements SHOULD also conform to the following additional RE
syntax restrictions, which I think should make them easy to
convert to most other standard regex implementations (subject to
unicode support limitations):

(1) Allow \d, \D, \s, \S, \w and \W; but not inside character classes.
(2) Disallow \i, \c; and their negative equivalents.
(3) Disallow character class subtraction (e.g. "[A-Z-[RW]]").
(4) Limit the supported unicode categories to only the following
8.  Both \p and \P syntax is supported, but not inside character
classes:
  \p{L} or any kind of letter from any language.
  \p{Ll} a lowercase letter that has an uppercase variant.
  \p{Lu} an uppercase letter that has a lowercase variant.
  \p{Z} any kind of whitespace or invisible separator.
  \p{Zs} a whitespace character that is invisible, but does take
up space.
  \p{Zl} a line separator character U+2028.
  \p{N} any kind of numeric character in any script.
  \p{Nd}: a digit zero through nine in any script except ideographic
(5) Disallow matching of unicode blocks.

Thanks,
Rob



Educating _all_ module authors to write [0..9] instead of \d will
likely be more expensive than improving the code of implementations
that did not implement YANG entirely to accept \d.

/js




___
netmod mailing list
netmod@ietf.org 
https://www.ietf.org/mailman/listinfo/netmod





___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-29 Thread Andy Bierman
Hi,

I agree with Juergen that these proposed guidelines are not a good idea.
The priority order for YANG is (1) readers (2) writers and (3) toolmakers.
It seems trivial for group (3) to convert the XSD pattern to some other
format.
It seems difficult to train all the people in groups (1) and (2) that there
are lots of
special new rules to learn.


Andy


On Tue, Aug 29, 2017 at 7:27 AM, Robert Wilton  wrote:

>
> On 28/08/2017 16:46, Juergen Schoenwaelder wrote:
>
> On Mon, Aug 28, 2017 at 12:58:59PM +, Xufeng Liu wrote:
>
> [Xufeng] [0..9] is still compliant with the XSD pattern specified by
> YANG 1.0 and 1.1. Using [0..9] instead of [\d] will make the
> implementations with native POSIX RegEx easier without the need for
> a tool to inspect every element of the RegEx pattern.
>
> Yes, but then \d is legal in YANG (and it is used in a couple of
> published RFCs).
>
> I entirely agree that YANG regular expressions must be legal XML Schema
> regular expressions.
>
> However, I don't think that the majority of YANG implementations are going
> to want to either use libxml or write their own implementation of the XML
> RE language.  Instead it is desirable that they can use whatever standard
> regex implementation comes with their language, or is readily available in
> a library.
>
> Most of the pattern statements I see in YANG modules use a basic subset of
> regular expressions, and hence it looks like they can often be used by most
> RE engines, perhaps with some trivial tweaks or conversions.  However,
> there is no formal guidance recommending that pattern statements in
> standard modules are restricted to a subset of XML RE.
>
> Hence, ideally I would like 6087bis to state that pattern statements
> SHOULD also conform to the following additional RE syntax restrictions,
> which I think should make them easy to convert to most other standard regex
> implementations (subject to unicode support limitations):
>
> (1) Allow \d, \D, \s, \S, \w and \W; but not inside character classes.
> (2) Disallow \i, \c; and their negative equivalents.
> (3) Disallow character class subtraction (e.g. "[A-Z-[RW]]").
> (4) Limit the supported unicode categories to only the following 8.  Both
> \p and \P syntax is supported, but not inside character classes:
>   \p{L} or any kind of letter from any language.
>   \p{Ll} a lowercase letter that has an uppercase variant.
>   \p{Lu} an uppercase letter that has a lowercase variant.
>   \p{Z} any kind of whitespace or invisible separator.
>   \p{Zs} a whitespace character that is invisible, but does take up space.
>   \p{Zl} a line separator character U+2028.
>   \p{N} any kind of numeric character in any script.
>   \p{Nd}: a digit zero through nine in any script except ideographic
> (5) Disallow matching of unicode blocks.
>
> Thanks,
> Rob
>
>
> Educating _all_ module authors to write [0..9] instead of \d will
> likely be more expensive than improving the code of implementations
> that did not implement YANG entirely to accept \d.
>
> /js
>
>
>
>
> ___
> netmod mailing list
> netmod@ietf.org
> https://www.ietf.org/mailman/listinfo/netmod
>
>
___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-29 Thread Robert Wilton


On 28/08/2017 16:46, Juergen Schoenwaelder wrote:

On Mon, Aug 28, 2017 at 12:58:59PM +, Xufeng Liu wrote:

[Xufeng] [0..9] is still compliant with the XSD pattern specified by
YANG 1.0 and 1.1. Using [0..9] instead of [\d] will make the
implementations with native POSIX RegEx easier without the need for
a tool to inspect every element of the RegEx pattern.

Yes, but then \d is legal in YANG (and it is used in a couple of
published RFCs).
I entirely agree that YANG regular expressions must be legal XML Schema 
regular expressions.


However, I don't think that the majority of YANG implementations are 
going to want to either use libxml or write their own implementation of 
the XML RE language.  Instead it is desirable that they can use whatever 
standard regex implementation comes with their language, or is readily 
available in a library.


Most of the pattern statements I see in YANG modules use a basic subset 
of regular expressions, and hence it looks like they can often be used 
by most RE engines, perhaps with some trivial tweaks or conversions.  
However, there is no formal guidance recommending that pattern 
statements in standard modules are restricted to a subset of XML RE.


Hence, ideally I would like 6087bis to state that pattern statements 
SHOULD also conform to the following additional RE syntax restrictions, 
which I think should make them easy to convert to most other standard 
regex implementations (subject to unicode support limitations):


(1) Allow \d, \D, \s, \S, \w and \W; but not inside character classes.
(2) Disallow \i, \c; and their negative equivalents.
(3) Disallow character class subtraction (e.g. "[A-Z-[RW]]").
(4) Limit the supported unicode categories to only the following 8. Both 
\p and \P syntax is supported, but not inside character classes:

  \p{L} or any kind of letter from any language.
  \p{Ll} a lowercase letter that has an uppercase variant.
  \p{Lu} an uppercase letter that has a lowercase variant.
  \p{Z} any kind of whitespace or invisible separator.
  \p{Zs} a whitespace character that is invisible, but does take up space.
  \p{Zl} a line separator character U+2028.
  \p{N} any kind of numeric character in any script.
  \p{Nd}: a digit zero through nine in any script except ideographic
(5) Disallow matching of unicode blocks.

Thanks,
Rob




Educating _all_ module authors to write [0..9] instead of \d will
likely be more expensive than improving the code of implementations
that did not implement YANG entirely to accept \d.

/js



___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-29 Thread Juergen Schoenwaelder
On Tue, Aug 29, 2017 at 09:39:05AM +0200, Benoit Claise wrote:
> In this discussion, let's keep in mind that the openconfig modules use the
> POSIX regex while the IETF uses the W3C regex.
> So for operators that have to deal with a mix of openconfig and IETF
> modules, this type of advice could be handy from a tooling point of view.
> Such advice, if not in RFC6087bis, could be provided in the yangre tool or
> in its GUI equivalent: https://yangcatalog.org/yangre

But we might make things worse. POSIX regex != XSD regex and trying to
make them look more similar likely worsens the confusion by making it
even more subtle that there are differences. From this perspective,
using \d instead of [0-9] is actually a good thing.

/js

-- 
Juergen Schoenwaelder   Jacobs University Bremen gGmbH
Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103 

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-29 Thread Benoit Claise
In this discussion, let's keep in mind that the openconfig modules use 
the POSIX regex while the IETF uses the W3C regex.
So for operators that have to deal with a mix of openconfig and IETF 
modules, this type of advice could be handy from a tooling point of 
view. Such advice, if not in RFC6087bis, could be provided in the yangre 
tool or in its GUI equivalent: https://yangcatalog.org/yangre


Regards, Benoit

On Mon, Aug 28, 2017 at 12:58:59PM +, Xufeng Liu wrote:

[Xufeng] [0..9] is still compliant with the XSD pattern specified by
YANG 1.0 and 1.1. Using [0..9] instead of [\d] will make the
implementations with native POSIX RegEx easier without the need for
a tool to inspect every element of the RegEx pattern.

Yes, but then \d is legal in YANG (and it is used in a couple of
published RFCs).

Educating _all_ module authors to write [0..9] instead of \d will
likely be more expensive than improving the code of implementations
that did not implement YANG entirely to accept \d.

/js



___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-28 Thread Juergen Schoenwaelder
On Mon, Aug 28, 2017 at 12:58:59PM +, Xufeng Liu wrote:
> 
> [Xufeng] [0..9] is still compliant with the XSD pattern specified by
> YANG 1.0 and 1.1. Using [0..9] instead of [\d] will make the
> implementations with native POSIX RegEx easier without the need for
> a tool to inspect every element of the RegEx pattern.

Yes, but then \d is legal in YANG (and it is used in a couple of
published RFCs).

Educating _all_ module authors to write [0..9] instead of \d will
likely be more expensive than improving the code of implementations
that did not implement YANG entirely to accept \d.

/js

-- 
Juergen Schoenwaelder   Jacobs University Bremen gGmbH
Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103 

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-28 Thread Xufeng Liu


> -Original Message-
> From: Juergen Schoenwaelder [mailto:j.schoenwael...@jacobs-university.de]
> Sent: Friday, August 25, 2017 8:53 AM
> To: Xufeng Liu 
> Cc: Per Hedeland ; Ladislav Lhotka ;
> 'netmod@ietf.org' 
> Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
> 
> On Fri, Aug 25, 2017 at 12:40:18PM +, Xufeng Liu wrote:
> >
> > > I did not see a proposed change to the standard YANG specification
> > > regarding the regexp flavor, only a proposal that module authors
> > > SHOULD show consideration for implementations that don't comply with
> > > the standard.
> > >
> 
> > [Xufeng] This is the point.
> 
> Perhaps this did not come out properly.
> 
> Anyway, why would I as a YANG author have to replace \d with [0..9] so that
> implementations that can't handle \d are happy? An implementation can easily
> do this substitution itself before passing the pattern to the regex engine it
> prefers to use. (I think this is what libyang is actually doing, I think it 
> uses pcre
> internally.)
> 
> YANG 1.0 and 1.1 are pretty clear which pattern syntax they use.
> Implementations should try to support that.

[Xufeng] [0..9] is still compliant with the XSD pattern specified by YANG 1.0 
and 1.1. Using [0..9] instead of [\d] will make the implementations with native 
POSIX RegEx easier without the need for a tool to inspect every element of the 
RegEx pattern.

> 
> /js
> 
> --
> Juergen Schoenwaelder   Jacobs University Bremen gGmbH
> Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
> Fax:   +49 421 200 3103 <http://www.jacobs-university.de/>

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-25 Thread Juergen Schoenwaelder
On Fri, Aug 25, 2017 at 12:40:18PM +, Xufeng Liu wrote:
> 
> > I did not see a proposed change to the standard YANG specification
> > regarding the regexp flavor, only a proposal that module authors
> > SHOULD show consideration for implementations that don't comply
> > with the standard.
> >

> [Xufeng] This is the point.
 
Perhaps this did not come out properly.

Anyway, why would I as a YANG author have to replace \d with [0..9] so
that implementations that can't handle \d are happy? An implementation
can easily do this substitution itself before passing the pattern to
the regex engine it prefers to use. (I think this is what libyang is
actually doing, I think it uses pcre internally.)

YANG 1.0 and 1.1 are pretty clear which pattern syntax they use.
Implementations should try to support that.

/js

-- 
Juergen Schoenwaelder   Jacobs University Bremen gGmbH
Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103 

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-25 Thread Xufeng Liu


> -Original Message-
> From: Carsten Bormann [mailto:c...@tzi.org]
> Sent: Thursday, August 24, 2017 2:11 PM
> To: Xufeng Liu 
> Cc: netmod@ietf.org
> Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
> 
> On Aug 23, 2017, at 23:20, Xufeng Liu  wrote:
> >
> > 1.2.2. Avoid Unicode Characters
> >
> > Unicode characters are allowed in XSD regular expressions, but are not
> supported in the POSIX variant. If possible, the model designers SHOULD avoid
> using Unicode characters, such as: \p{L} and \p{N}.
> 
> All ASCII characters are Unicode Characters.
> I think ASCII characters are useful and should be allowed.
> 
> (Is this maybe about Unicode categories and character classes?)
[Xufeng] Certainly you are right. It was meant to say non-ASCII Unicode 
characters.

> 
> Grüße, Carsten

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-25 Thread Xufeng Liu


> -Original Message-
> From: Per Hedeland [mailto:p...@tail-f.com]
> Sent: Thursday, August 24, 2017 1:15 PM
> To: Ladislav Lhotka 
> Cc: 'netmod@ietf.org' ; Xufeng Liu 
> Subject: Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines
> 
> On 2017-08-24 17:54, Ladislav Lhotka wrote:
> > Per Hedeland  writes:
> >
> >> I strongly agree with all of Juergen's statements, and disagree also
> >> with the suggestion to include the parts of the text that he didn't
> >> specifically disagree with. And I'd like to add that the "lack of XSD
> >> support" argument is pretty weak - there exists at least one freely
> >> available implementation in the form of libxml2, which is actually
> >> present by default in basically all "normal" Linux installations.
> >> It is portable C code, and the parts needed for regexp matching
> >> amount to just above 100 kB of compiled code on an x86_64 CPU.
> >
> > I wouldn't be so strict here. Libxml2 has its share of problems - for
> > one, its "official" bindings do not support Python3, so e.g. in
> > Yangson I had to use PyXB package instead and pyang gives up pattern
> > validation in Python 3 entirely.
> 
> I don't really see how claiming that the "lack of XSD support" argument is 
> weak
> amounts to being "strict" - and I suspect that the claim is valid even 
> considering
> the amount of pattern-validating server and client implementations written in
> Python3. For a validation/translation tool such as pyang, having a minuscule C
> program that is invoked for validation would seem to be a reasonable
> implementation if no other options exist, though admittedly it is an 
> annoyance.
> 
[Xufeng] Besides the language issue, there are situations where XML is not 
used, so that it is not desirable to include the libxml2 library.

> > That being said, there doesn't seem to be a clearly superior
> > replacement, and some aspects of XSD regexes, such as support for
> > Unicode and the absence of ^ and $ anchors, make a lot of sense in
> > YANG. So I am also not in favour of the proposed change.
> 
> I did not see a proposed change to the standard YANG specification regarding
> the regexp flavor, only a proposal that module authors SHOULD show
> consideration for implementations that don't comply with the standard.
> 
[Xufeng] This is the point.

> --Per
> 
> > BTW, it is actually a shame that there is no standard regex language
> > that could be easily used in all programming languages. Oh well ...
> >
> > Lada
> >
> >>
> >> --Per
> >>
> >> On 2017-08-24 08:09, Juergen Schoenwaelder wrote:
> >>> On Wed, Aug 23, 2017 at 09:20:36PM +, Xufeng Liu wrote:
> >>>> Members of Routing Area Yang DT have had some discussions about the
> handling of various variants of regular expressions. The followings are the
> current state, and we are thinking that if this topic can be added to 
> RFC6087bis:
> >>>>
> >>>> 1. Regular Expression Usage
> >>>> YANG uses regular expressions to restrict string values. Such a 
> >>>> restriction
> can be a part of a "pattern" statement or a string matching function. 
> [RFC7950]
> specifies that YANG regular expressions will conform to Appendix F in [XSD-
> TYPES].
> >>>> YANG models have been implemented in many different environments and
> the XSD variant of the regular expressions is not supported in many of these
> environments. There are currently more than a dozen popular regular expression
> variants implemented in various environments. While the usage of the XSD
> variant of regular expression described in [RFC7950] remains the preferred
> standard, a few conventions are prescribed to maximize the portability of YANG
> models between environments.
> >>>>
> >>>
> >>> I strongly disagree with this statement. The standard format are XSD
> >>> regular expressions. RFC 7950 section 9.4.5:
> >>>
> >>> The "pattern" statement, which is an optional substatement to the
> >>> "type" statement, takes as an argument a regular expression string,
> >>> as defined in [XSD-TYPES].
> >>>
> >>> There is no notion of a 'preferred' standard.
> >>>
> >>>> 1.1. Regular Expression Variant Choice Precedence YANG model
> >>>> designers SHOULD use the most portable syntax whenever possible. Under
> the condition that XSD compliance 

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-24 Thread Carsten Bormann
On Aug 23, 2017, at 23:20, Xufeng Liu  wrote:
> 
> 1.2.2. Avoid Unicode Characters
> 
> Unicode characters are allowed in XSD regular expressions, but are not 
> supported in the POSIX variant. If possible, the model designers SHOULD avoid 
> using Unicode characters, such as: \p{L} and \p{N}.

All ASCII characters are Unicode Characters.  
I think ASCII characters are useful and should be allowed.

(Is this maybe about Unicode categories and character classes?)

Grüße, Carsten

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-24 Thread Per Hedeland
On 2017-08-24 17:54, Ladislav Lhotka wrote:
> Per Hedeland  writes:
> 
>> I strongly agree with all of Juergen's statements, and disagree also
>> with the suggestion to include the parts of the text that he didn't
>> specifically disagree with. And I'd like to add that the "lack of XSD
>> support" argument is pretty weak - there exists at least one freely
>> available implementation in the form of libxml2, which is actually
>> present by default in basically all "normal" Linux installations.
>> It is portable C code, and the parts needed for regexp matching amount
>> to just above 100 kB of compiled code on an x86_64 CPU.
> 
> I wouldn't be so strict here. Libxml2 has its share of problems - for
> one, its "official" bindings do not support Python3, so e.g. in Yangson
> I had to use PyXB package instead and pyang gives up pattern validation
> in Python 3 entirely. 

I don't really see how claiming that the "lack of XSD support" argument
is weak amounts to being "strict" - and I suspect that the claim is
valid even considering the amount of pattern-validating server and
client implementations written in Python3. For a validation/translation
tool such as pyang, having a minuscule C program that is invoked for
validation would seem to be a reasonable implementation if no other
options exist, though admittedly it is an annoyance.

> That being said, there doesn't seem to be a clearly superior
> replacement, and some aspects of XSD regexes, such as support for
> Unicode and the absence of ^ and $ anchors, make a lot of sense in
> YANG. So I am also not in favour of the proposed change.

I did not see a proposed change to the standard YANG specification
regarding the regexp flavor, only a proposal that module authors SHOULD
show consideration for implementations that don't comply with the
standard.

--Per

> BTW, it is actually a shame that there is no standard regex language
> that could be easily used in all programming languages. Oh well ...
> 
> Lada
> 
>>
>> --Per
>>
>> On 2017-08-24 08:09, Juergen Schoenwaelder wrote:
>>> On Wed, Aug 23, 2017 at 09:20:36PM +, Xufeng Liu wrote:
 Members of Routing Area Yang DT have had some discussions about the 
 handling of various variants of regular expressions. The followings are 
 the current state, and we are thinking that if this topic can be added to 
 RFC6087bis:

 1. Regular Expression Usage
 YANG uses regular expressions to restrict string values. Such a 
 restriction can be a part of a "pattern" statement or a string matching 
 function. [RFC7950] specifies that YANG regular expressions will conform 
 to Appendix F in [XSD-TYPES].
 YANG models have been implemented in many different environments and the 
 XSD variant of the regular expressions is not supported in many of these 
 environments. There are currently more than a dozen popular regular 
 expression variants implemented in various environments. While the usage 
 of the XSD variant of regular expression described in [RFC7950] remains 
 the preferred standard, a few conventions are prescribed to maximize the 
 portability of YANG models between environments.

>>>
>>> I strongly disagree with this statement. The standard format are XSD
>>> regular expressions. RFC 7950 section 9.4.5:
>>>
>>> The "pattern" statement, which is an optional substatement to the
>>> "type" statement, takes as an argument a regular expression string,
>>> as defined in [XSD-TYPES].
>>>
>>> There is no notion of a 'preferred' standard.
>>>
 1.1. Regular Expression Variant Choice Precedence
 YANG model designers SHOULD use the most portable syntax whenever 
 possible. Under the condition that XSD compliance is satisfied and there 
 are multiple choices for a given expression, the following precedence 
 SHOULD be used to choose a regular expressions variant:

 oPOSIX base

 oPOSIX extended

 oBSD

 oGNU Regular Expression Extensions

 oC++ Regular Expressions with std::regex

 oOthers
>>>
>>> Strongly disagree. You either write YANG or something different. There
>>> is no way to recognize what kind of regular expressions have been used
>>> by the model designer. The value of a standard is that everybody does
>>> the same.
>>>
 For example, either \d or [0-9] can be used with equivalent semantics and 
 they are both compliant to [XSD-TYPES]. [0-9] is recommended because [0-9] 
 is supported by POSIX base but \d is not.

 1.2.  Convention Guidelines
 1.2.1. Avoid Character Category Escapes
 For example, in XSD regular expression, \d is a Character Category Escape 
 denoting the range of digits, i.e.,  [0-9]. To maximize portability, the 
 model designers SHOULD use [0-9] instead of \d.

 1.2.2. Avoid Unicode Characters
 Unicode characters are allowed in XSD regular expressions, but are not 
 supported in th

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-24 Thread Ladislav Lhotka
Per Hedeland  writes:

> I strongly agree with all of Juergen's statements, and disagree also
> with the suggestion to include the parts of the text that he didn't
> specifically disagree with. And I'd like to add that the "lack of XSD
> support" argument is pretty weak - there exists at least one freely
> available implementation in the form of libxml2, which is actually
> present by default in basically all "normal" Linux installations.
> It is portable C code, and the parts needed for regexp matching amount
> to just above 100 kB of compiled code on an x86_64 CPU.

I wouldn't be so strict here. Libxml2 has its share of problems - for
one, its "official" bindings do not support Python3, so e.g. in Yangson
I had to use PyXB package instead and pyang gives up pattern validation
in Python 3 entirely. 

That being said, there doesn't seem to be a clearly superior
replacement, and some aspects of XSD regexes, such as support for
Unicode and the absence of ^ and $ anchors, make a lot of sense in
YANG. So I am also not in favour of the proposed change.

BTW, it is actually a shame that there is no standard regex language
that could be easily used in all programming languages. Oh well ...

Lada

>
> --Per
>
> On 2017-08-24 08:09, Juergen Schoenwaelder wrote:
>> On Wed, Aug 23, 2017 at 09:20:36PM +, Xufeng Liu wrote:
>>> Members of Routing Area Yang DT have had some discussions about the 
>>> handling of various variants of regular expressions. The followings are the 
>>> current state, and we are thinking that if this topic can be added to 
>>> RFC6087bis:
>>>
>>> 1. Regular Expression Usage
>>> YANG uses regular expressions to restrict string values. Such a restriction 
>>> can be a part of a "pattern" statement or a string matching function. 
>>> [RFC7950] specifies that YANG regular expressions will conform to Appendix 
>>> F in [XSD-TYPES].
>>> YANG models have been implemented in many different environments and the 
>>> XSD variant of the regular expressions is not supported in many of these 
>>> environments. There are currently more than a dozen popular regular 
>>> expression variants implemented in various environments. While the usage of 
>>> the XSD variant of regular expression described in [RFC7950] remains the 
>>> preferred standard, a few conventions are prescribed to maximize the 
>>> portability of YANG models between environments.
>>>
>> 
>> I strongly disagree with this statement. The standard format are XSD
>> regular expressions. RFC 7950 section 9.4.5:
>> 
>> The "pattern" statement, which is an optional substatement to the
>> "type" statement, takes as an argument a regular expression string,
>> as defined in [XSD-TYPES].
>> 
>> There is no notion of a 'preferred' standard.
>> 
>>> 1.1. Regular Expression Variant Choice Precedence
>>> YANG model designers SHOULD use the most portable syntax whenever possible. 
>>> Under the condition that XSD compliance is satisfied and there are multiple 
>>> choices for a given expression, the following precedence SHOULD be used to 
>>> choose a regular expressions variant:
>>>
>>> oPOSIX base
>>>
>>> oPOSIX extended
>>>
>>> oBSD
>>>
>>> oGNU Regular Expression Extensions
>>>
>>> oC++ Regular Expressions with std::regex
>>>
>>> oOthers
>> 
>> Strongly disagree. You either write YANG or something different. There
>> is no way to recognize what kind of regular expressions have been used
>> by the model designer. The value of a standard is that everybody does
>> the same.
>> 
>>> For example, either \d or [0-9] can be used with equivalent semantics and 
>>> they are both compliant to [XSD-TYPES]. [0-9] is recommended because [0-9] 
>>> is supported by POSIX base but \d is not.
>>>
>>> 1.2.  Convention Guidelines
>>> 1.2.1. Avoid Character Category Escapes
>>> For example, in XSD regular expression, \d is a Character Category Escape 
>>> denoting the range of digits, i.e.,  [0-9]. To maximize portability, the 
>>> model designers SHOULD use [0-9] instead of \d.
>>>
>>> 1.2.2. Avoid Unicode Characters
>>> Unicode characters are allowed in XSD regular expressions, but are not 
>>> supported in the POSIX variant. If possible, the model designers SHOULD 
>>> avoid using Unicode characters, such as: \p{L} and \p{N}.
>>>
>>> 1.3. Conversion Tools
>>> Tools can automatically convert regular expressions from one variant to 
>>> another. When a YANG model is implemented in an environment where XSD 
>>> regular expressions are not supported, the recommended approach is to use a 
>>> conversion tool. For example, if needed, anchor position characters, i.e., 
>>> '^' and '$', can be added by a regular expression conversion tool.
>> 
>> If conversion tools exist that can convert, then by all means use XSD
>> in the YANG model and use tools to convert to whatever format your
>> implementation prefers to use.
>> 
>> /js
>> 
>
> ___
> netmod mailing list
> netmod@ietf.org
> https://www.ietf

Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-24 Thread Per Hedeland

I strongly agree with all of Juergen's statements, and disagree also
with the suggestion to include the parts of the text that he didn't
specifically disagree with. And I'd like to add that the "lack of XSD
support" argument is pretty weak - there exists at least one freely
available implementation in the form of libxml2, which is actually
present by default in basically all "normal" Linux installations.
It is portable C code, and the parts needed for regexp matching amount
to just above 100 kB of compiled code on an x86_64 CPU.

--Per

On 2017-08-24 08:09, Juergen Schoenwaelder wrote:

On Wed, Aug 23, 2017 at 09:20:36PM +, Xufeng Liu wrote:

Members of Routing Area Yang DT have had some discussions about the handling of 
various variants of regular expressions. The followings are the current state, 
and we are thinking that if this topic can be added to RFC6087bis:

1. Regular Expression Usage
YANG uses regular expressions to restrict string values. Such a restriction can be a part 
of a "pattern" statement or a string matching function. [RFC7950] specifies 
that YANG regular expressions will conform to Appendix F in [XSD-TYPES].
YANG models have been implemented in many different environments and the XSD 
variant of the regular expressions is not supported in many of these 
environments. There are currently more than a dozen popular regular expression 
variants implemented in various environments. While the usage of the XSD 
variant of regular expression described in [RFC7950] remains the preferred 
standard, a few conventions are prescribed to maximize the portability of YANG 
models between environments.



I strongly disagree with this statement. The standard format are XSD
regular expressions. RFC 7950 section 9.4.5:

The "pattern" statement, which is an optional substatement to the
"type" statement, takes as an argument a regular expression string,
as defined in [XSD-TYPES].

There is no notion of a 'preferred' standard.


1.1. Regular Expression Variant Choice Precedence
YANG model designers SHOULD use the most portable syntax whenever possible. 
Under the condition that XSD compliance is satisfied and there are multiple 
choices for a given expression, the following precedence SHOULD be used to 
choose a regular expressions variant:

oPOSIX base

oPOSIX extended

oBSD

oGNU Regular Expression Extensions

oC++ Regular Expressions with std::regex

oOthers


Strongly disagree. You either write YANG or something different. There
is no way to recognize what kind of regular expressions have been used
by the model designer. The value of a standard is that everybody does
the same.


For example, either \d or [0-9] can be used with equivalent semantics and they 
are both compliant to [XSD-TYPES]. [0-9] is recommended because [0-9] is 
supported by POSIX base but \d is not.

1.2.  Convention Guidelines
1.2.1. Avoid Character Category Escapes
For example, in XSD regular expression, \d is a Character Category Escape 
denoting the range of digits, i.e.,  [0-9]. To maximize portability, the model 
designers SHOULD use [0-9] instead of \d.

1.2.2. Avoid Unicode Characters
Unicode characters are allowed in XSD regular expressions, but are not 
supported in the POSIX variant. If possible, the model designers SHOULD avoid 
using Unicode characters, such as: \p{L} and \p{N}.

1.3. Conversion Tools
Tools can automatically convert regular expressions from one variant to 
another. When a YANG model is implemented in an environment where XSD regular 
expressions are not supported, the recommended approach is to use a conversion 
tool. For example, if needed, anchor position characters, i.e., '^' and '$', 
can be added by a regular expression conversion tool.


If conversion tools exist that can convert, then by all means use XSD
in the YANG model and use tools to convert to whatever format your
implementation prefers to use.

/js



___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


Re: [netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-23 Thread Juergen Schoenwaelder
On Wed, Aug 23, 2017 at 09:20:36PM +, Xufeng Liu wrote:
> Members of Routing Area Yang DT have had some discussions about the handling 
> of various variants of regular expressions. The followings are the current 
> state, and we are thinking that if this topic can be added to RFC6087bis:
> 
> 1. Regular Expression Usage
> YANG uses regular expressions to restrict string values. Such a restriction 
> can be a part of a "pattern" statement or a string matching function. 
> [RFC7950] specifies that YANG regular expressions will conform to Appendix F 
> in [XSD-TYPES].
> YANG models have been implemented in many different environments and the XSD 
> variant of the regular expressions is not supported in many of these 
> environments. There are currently more than a dozen popular regular 
> expression variants implemented in various environments. While the usage of 
> the XSD variant of regular expression described in [RFC7950] remains the 
> preferred standard, a few conventions are prescribed to maximize the 
> portability of YANG models between environments.
>

I strongly disagree with this statement. The standard format are XSD
regular expressions. RFC 7950 section 9.4.5:

   The "pattern" statement, which is an optional substatement to the
   "type" statement, takes as an argument a regular expression string,
   as defined in [XSD-TYPES].

There is no notion of a 'preferred' standard.

> 1.1. Regular Expression Variant Choice Precedence
> YANG model designers SHOULD use the most portable syntax whenever possible. 
> Under the condition that XSD compliance is satisfied and there are multiple 
> choices for a given expression, the following precedence SHOULD be used to 
> choose a regular expressions variant:
> 
> oPOSIX base
> 
> oPOSIX extended
> 
> oBSD
> 
> oGNU Regular Expression Extensions
> 
> oC++ Regular Expressions with std::regex
> 
> oOthers

Strongly disagree. You either write YANG or something different. There
is no way to recognize what kind of regular expressions have been used
by the model designer. The value of a standard is that everybody does
the same.

> For example, either \d or [0-9] can be used with equivalent semantics and 
> they are both compliant to [XSD-TYPES]. [0-9] is recommended because [0-9] is 
> supported by POSIX base but \d is not.
> 
> 1.2.  Convention Guidelines
> 1.2.1. Avoid Character Category Escapes
> For example, in XSD regular expression, \d is a Character Category Escape 
> denoting the range of digits, i.e.,  [0-9]. To maximize portability, the 
> model designers SHOULD use [0-9] instead of \d.
> 
> 1.2.2. Avoid Unicode Characters
> Unicode characters are allowed in XSD regular expressions, but are not 
> supported in the POSIX variant. If possible, the model designers SHOULD avoid 
> using Unicode characters, such as: \p{L} and \p{N}.
> 
> 1.3. Conversion Tools
> Tools can automatically convert regular expressions from one variant to 
> another. When a YANG model is implemented in an environment where XSD regular 
> expressions are not supported, the recommended approach is to use a 
> conversion tool. For example, if needed, anchor position characters, i.e., 
> '^' and '$', can be added by a regular expression conversion tool.

If conversion tools exist that can convert, then by all means use XSD
in the YANG model and use tools to convert to whatever format your
implementation prefers to use.

/js

-- 
Juergen Schoenwaelder   Jacobs University Bremen gGmbH
Phone: +49 421 200 3587 Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103 

___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod


[netmod] Potential additions to rfc6087bis: RegEx guidelines

2017-08-23 Thread Xufeng Liu
Members of Routing Area Yang DT have had some discussions about the handling of 
various variants of regular expressions. The followings are the current state, 
and we are thinking that if this topic can be added to RFC6087bis:

1. Regular Expression Usage
YANG uses regular expressions to restrict string values. Such a restriction can 
be a part of a "pattern" statement or a string matching function. [RFC7950] 
specifies that YANG regular expressions will conform to Appendix F in 
[XSD-TYPES].
YANG models have been implemented in many different environments and the XSD 
variant of the regular expressions is not supported in many of these 
environments. There are currently more than a dozen popular regular expression 
variants implemented in various environments. While the usage of the XSD 
variant of regular expression described in [RFC7950] remains the preferred 
standard, a few conventions are prescribed to maximize the portability of YANG 
models between environments.

1.1. Regular Expression Variant Choice Precedence
YANG model designers SHOULD use the most portable syntax whenever possible. 
Under the condition that XSD compliance is satisfied and there are multiple 
choices for a given expression, the following precedence SHOULD be used to 
choose a regular expressions variant:

oPOSIX base

oPOSIX extended

oBSD

oGNU Regular Expression Extensions

oC++ Regular Expressions with std::regex

oOthers

For example, either \d or [0-9] can be used with equivalent semantics and they 
are both compliant to [XSD-TYPES]. [0-9] is recommended because [0-9] is 
supported by POSIX base but \d is not.

1.2.  Convention Guidelines
1.2.1. Avoid Character Category Escapes
For example, in XSD regular expression, \d is a Character Category Escape 
denoting the range of digits, i.e.,  [0-9]. To maximize portability, the model 
designers SHOULD use [0-9] instead of \d.

1.2.2. Avoid Unicode Characters
Unicode characters are allowed in XSD regular expressions, but are not 
supported in the POSIX variant. If possible, the model designers SHOULD avoid 
using Unicode characters, such as: \p{L} and \p{N}.

1.3. Conversion Tools
Tools can automatically convert regular expressions from one variant to 
another. When a YANG model is implemented in an environment where XSD regular 
expressions are not supported, the recommended approach is to use a conversion 
tool. For example, if needed, anchor position characters, i.e., '^' and '$', 
can be added by a regular expression conversion tool.

1.4. Validation Tools
Tools can be used to validate regular expressions in YANG files. The followings 
are some of these tools:

oYANG W3C Regex Expression Validator: https://yangcatalog.org/yangre

This an on-line tool with a WEB interface.

oyangre as a part of the libyang package:
https://github.com/CESNET/libyang

This is an open source tool with a command line interface.

Usage:

yangre [-hvV] -p  [-i] [-p  [-i] ...] 



Returns 0 if string matches the pattern(s), 1 if not and -1 on error.



Options:

  -h, --help  Show this help message and exit.

  -v, --version   Show version number and exit.

  -V, --verbose   Print the processing information.

  -i, --invert-match  Invert-match modifier for the closest preceeding

  pattern.

  -p, --pattern="REGEXP"  Regular expression including the quoting,

  which is applied the same way as in a YANG module.



Examples:

  pattern "[0-9a-fA-F]*";  -> yangre -p '"[0-9a-fA-F]*"' '1F'

  pattern '[a-zA-Z0-9\-_.]*';  -> yangre -p "'[a-zA-Z0-9\-_.]*'" 'a-b'

  pattern [xX][mM][lL].*;  -> yangre -p '[xX][mM][lL].*' 'xml-encoding'



References



[XSD-TYPES] Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes Second 
Edition", World Wide Web Consortium Recommendation REC-xmlschema-2-20041028, 
October 2004, 

[POSIX]   IEEE Std 1003.1-2008, 2016, 


[BSD] Regular Expression, 

[C++] ISO/IEC DIS 14882: Programming Languages - C++, 2017.

Thanks,
- Xufeng
___
netmod mailing list
netmod@ietf.org
https://www.ietf.org/mailman/listinfo/netmod