[caiman-discuss] Please review: 4325 Better syntactic treatment of IP and MAC address AI criteria

Jack Schwartz Sat, 04 Apr 2009 07:29:37 -0700

Hi Clay.

Thanks for your in-depth review, and for your comments and suggestions.  
My responses are inline.


Delta webrev is at: 
http://cr.opensolaris.org/~schwartz/090402.1/webrev.incr_1_2/
New webrev is at: http://cr.opensolaris.org/~schwartz/090402.1/webrev/

Clay Baenziger wrote:
> Hi Jack,
>     Thank you very much for doing this work! And thank you too for all 
> the cool ideas as we've talked back and forth on these changes. My 
> ideas are below:
>
> publish-manifest:
>     *Instead of using 9999... and 0 for infinity and a small value,
>      talking with Drew Fisher here in BRM, we should be using:
>      sys.maxint and -sys.maxint-1. I was originally not
>      sure if this worked on Windows but it'll work on the Unix world
>      and Windows. So, the hack should go away. (lines 204 & 205 could
>      go away then)
I have a few comments about this.

1)  Since we convert back and forth between integers and digit strings,I 
don't think sys.maxint will be big enough. 
 >>> import sys
 >>> print sys.maxint
2147483647

 >>> a=2147483647+1
 >>> print a
2147483648

For example if we store an IP address of 255.255.255.255 (I know this is 
unusual, but it is possible...) then the number from the digit string 
without colons is way larger than maxint.

2) The maximum integral value is greater than maxint, and we could end 
up with a stored value > maxint.

The python reference at www.python.org says that even a long is 
"implemented using long in C, which gives them at least 32 bits of 
precision".  To me that sounds like some platforms may not support more 
than 32 bits.  That said, comparing something with "999......999," which 
we are already doing, is ultimately going to go over 32 bits too...

I propose we replace INFINITY="999999....999" with

INFINITY=str(0xFFFFFFFFFFFFFFFF)

which is the digit string "18446744073709551615" or a 64 bit long

which is more reasonable, and is still more than large enough to cover 
12-digit hex numbers.  It is still not perfect though, since python lets 
you do the following:

 >>> 
a=str(0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF)
 >>> print  a
115792089237316195423570985008687907853269984665640564039457584007913129639935

The real solution, which should be documented in 7876, is to store digit 
strings in the database instead of integers, as we talked about Thursday 
in our video call.  This not only handles the current state of affairs, 
but plans ahead for if/when we implement storage of IPV6 addresses (with 
24 (?) hex digits).  I will be sure to update that bug report with this 
information.
>     *line 183 comment nit: "check [that] ranges are valid"
OK.

While doing this, I noticed comment references of "-inf".  I changed 
these references to 0, since that's what the code used before I started 
working on this bug, and what it still uses.  If you think it should be 
"-inf" instead then I'll file a bug to fix this later.  Please let me know.
>     *In many of the checks for unbounded since forcing case is painful
>      to the user, could you do a value.lower() or
>      manifestCriteria.lower() before doing the compare?
All "unbounded"s are already stored as lower case.  The conversion is 
done in verifyXML.py, here at lines 250-254:

                                # Handle "unbounded" keyword; and pass 
lowercase
                                lowered_value = one_value.lower()
                                if (lowered_value == "unbounded"):
                                        new_values += lowered_value
>
> AI_database.py:
>     *This is perhaps a nit since it's more of a stylistic and reuse
>      idea for formatValue(), I'd like to see this take a dictionary of
>      keys and values, so that it can handle all keys and values and
>      only process the few it needs opposed to running for every key.
>      I.e.:
>      for key in ("mac", "ipv4", "ipv6"...):
>         for prefix in ('', 'MIN', 'MAX'):
>              try:
>                 values[prefix+key]...
>                 [...]
>                 continue
>              except: KeyError
>                 pass
>      Of course, however you want, as this pseudocode is still kludgy,
>      but I'd like to see it be more efficient than just one key
>      per call of the function. (Then, list-manifest.py line 174 could
>      just have:
>      criteria=formatValues(criteria.keys(),criteria) and
>      webserver.py line 124 could just have:
>      crit=formatValues(crit.keys(),crit), for example.)
I tried this and ran into the problem of trying to change an immutable 
value*.  This means I'd have to create a new dictionary with new 
values.  This can be done, but I'm thinking it is probably more 
efficient to leave the code as is, since there aren't that many criteria 
to check, there are not that many formatValue calls to make.  A few 
calls to formatValue seems more efficient than recreating a dictionary.

* dict[key]=newValue gave
  TypeError: object does not support item assignment
and
  trying to do del dict[key] before reassignment, gave
AttributeError: 'pysqlite2.dbapi2.Row' object has no attribute 'iteritems'
when I tried to reassign
>     *Line 530: Elsewhere MIN or MAC are stripped with
>      var.replace('MIN', '', 1) though I like the key = key[3:] perhaps
>      consistency is better? (Feel free to ignore this if you think you
>      should, since you have more experience than I do in maintaining big
>      code bases.)
OK.
>     *Line 533: This padding will produce a padding like 0001722024199
>      where as the padding necessary is by octet 172020024199 (for an
>      address of 172.20.24.199). Also, I'm not sure why you're padding
>      values here, since the string is padded in the database already.
Oops.  Thanks.  Removed the padding from MAC addresses.
>
> Criteria_schema.rng:
>     *Thank you for setting the style to a more consistent one with how
>      other schemas have developed in the gate
>     *Line 55,58 Nit: I find unless a block is really big, making it a
>      name reference detracts from the schema readability from all the
>      jumping around. Do you have a tool to expand or collapse them
>      while reading?
I don't have a tool and I admit that schema readability can be difficult 
with many small blocks.  IMHO, I think it's worse without them though.  
I also think that taking a more "programmatic" approach to schemas 
(treating blocks in the schema like functions are in a program) helps 
break the schema into more manageable, more readable units.  The rules I 
code by apply well to schemas.  Some of these are:
1) Lines are 80 chars
2) Nesting levels get indented by one tab and continuations by four spaces
3) When nesting gets so deep that code gets all scrunched up on the 
right, it's time to revamp how that block of code is written, or else 
it's time to create a function (which lets me then start on the left again).
These rules fit me well for "coding" schemas too, and seem to make them 
more manageable for me.
>
>     *Lines 54-63: We only support one SC manifest and we need one SC
>      manifest with an AI manifest, let's take the zeroOrMore tag off
>      and wrap the AI and SC manifest in an optional tag:
>     <optional>
>             <ref name="nm_ai_manifest_src"/>
>             <ref name="nm_sc_manifest_src"/>
>         </optional>
Done.
>
> verifyXML.py:
>     *For __checkIPv4 and __checkMAC, it's too bad these don't return
>      the same error one would get for a malformed MAC or IPv4 were
>      they in a range.
Yes, I agree.  But at least their errors are caught and processed in an 
understandable way.
> If we used LXML to parse the values they could
>      by doing something like (for IPv4) -- this would also help when
>      we need to support IPv6 since they're a lot more complex:
>      # Load in the schema representation of an IPv4 address
>      s = StringIO('''\
>      <element name="a" xmlns="http://relaxng.org/ns/structure/1.0";>
>      <data type="token">
>      <param name="pattern">
>      
> ((25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])
>  
>
>      </param>
>      </data>
>      ''')
>      # Parse schema
>      ipv4_schema=etree.parse(s)
>      relaxng = lxml.etree.RelaxNG(ipv4_schema)
>      # from __checkIPv4, values is the IP address in question
>      data=stringIO(values)
>      try:
>                  root = lxml.etree.parse(data)
>          except lxml.etree.XMLSyntaxError, e:
>                  return e.error_log.last_error
>      [the you'd just need to do the rest of the splitting and padding]
>      which could be as simple as (in fact this would be preferable
>      to calling newval += ... even in the current form):
>      return(''.join([octet.zfill(3) for octet in values.split(".")]))
While this may be true, I'd rather skip this for now due to time 
constraints.  If you feel it is important enough, please file a bug.
>
>     *Line 161: Nit: Could you hand in criteriaRoot and the AI database
>      to make the dependencies of this function more obvious when
>      called?
Done.
>     *Lines 216-220: The schema should have line 146 changed to:
>      <data type="string"/> from <text/> and then this is redundant and
>      handled in XML validation
I think you meant line 123 here, the value type of <element 
name="value">, since the other text/ entries were for manifest names.

When I change this, lines 216-220 still fire when I do this in the manifest:

    <ai_criteria name="ipv4">
        <value>
                1.2.3.4 5.6.7.8
        </value>

so lines 216-220 have their use.  Leaving them in.

I did a fast web search and you are right in that "string" does not do 
whitespace normalization and "text" does.  However, it may be that 
normalization strips whitespace from the ends of a string, not from the 
middle, which is why I saw no effect in my test.

>     *Line 222-224: This is not necessary if a simple check is put in
>      publish-manifest (original line numbers 116 and 223) to report a
>      criteria is not a value criteria or range criteria, respectively.
Part of the reason for this function is to do validation which is missed 
by the schema, so the check seems appropriate here.
>     *Line 225-228: In my testing with a range I can't get more than
>      two lines to pass the XML validator, is this necessary?
Strictly speaking you are correct.  I had put this in since I thought I 
remembered criteria as being extensible and wanted this as a double 
check for user errors in the criteria manifest.  However, thinking more, 
extending criteria is difficult for the user to do, since there would 
need to be new database fields added.  Since the user cannot do this, 
I'll remove lines 225-228.
>     *Line 229-232: This should be handled in the schema, as only a
>      <range> or <value> tag are allowed?
OK.  Removed.
>
>     Otherwise, everything looks reasonable. Let me know what you think 
> of these observations, as there's a lot of ideas here and you have 
> better experience than me which should be pursued and which are nits. 
> Please call me with questions too.
Thanks again for your review.  Thanks also for teaching me some new 
python tricks, both during the code review and while studying your code 
during my initial coding process.

    Thanks,
    Jack
>
>                             Thank you,
>                             Clay
>
> On Thu, 2 Apr 2009, Jack Schwartz wrote:
>
>> Hi everyone.
>>
>> Here is a webrev of my proposed fixes for:
>>   4325 Better syntactic treatment of IP and MAC address AI criteria
>>
>> A medium-level description of the changes is in the bug-report:
>>   http://defect.opensolaris.org/bz/show_bug.cgi?id=4325
>>
>> Webrev:
>>   http://cr.opensolaris.org/~schwartz/090402.1/webrev/
>>
>> Testing done:
>>
>> Ran publish manifest and verified with webserver:
>> - MAC: min/unbounded, IP: unbounded/max, network: min/unbounded,
>>       mem: single <value> for range, minMAC has single digit btwn colons
>> - range with three values: correctly failed validation
>> - range with two unbounded values: correctly failed validation
>> - range with 0 minimum: correctly passed validation
>> - single IP addr given as a <range>: corectly failed validation
>> - pair of values given as a <value>: correctly failed validation
>> - 2 values for non range criterion, surrounded by <value>: correctly 
>> failed validation
>> - single value for non range criterion, surrounded by <value>: passed 
>> validation
>> - 2 values for non range criterion, surrounded by <range>: correctly 
>> failed validation
>> - Try a combination with a non-numeric value (arch=sparc): worked
>> Also verified list-manifests displayed data correctly.
>>
>> I'm requesting that Clay review, as he's most familiar with the code 
>> I've changed and he has a heads-up that it's coming.  Others are 
>> invited to review as well.
>>
>> Tomorrow I'll verify that an install starts as it should, but I have 
>> confidence it will work as the webserver output looks fine.
>>
>> Sorry to rush.  I'm requesting a review ASAP, by COB tomorrow if 
>> possible. I'm supposed to be done on Friday, but to give less than 1 
>> day for this review is not really realistic.
>>
>>   Thanks,
>>   Jack
>>

[caiman-discuss] Please review: 4325 Better syntactic treatment of IP and MAC address AI criteria

Reply via email to