[caiman-discuss] Please review: 4325 Better syntactic treatment of IP and MAC address AI criteria

Clay Baenziger Sun, 05 Apr 2009 23:51:19 -0600 (MDT)

Hi Jack,
        Comments in-line.
                                        Thank you,
                                        Clay


On Sat, 4 Apr 2009, Jack Schwartz wrote:

>
> Hi Clay.
>
> Thanks for your in-depth review, and for your comments and suggestions.  My 
> responses are inline.
>
> Delta webrev is at: 
> http://cr.opensolaris.org/~schwartz/090402.1/webrev.incr_1_2/
> New webrev is at: http://cr.opensolaris.org/~schwartz/090402.1/webrev/
>
> Clay Baenziger wrote:
>> Hi Jack,
>>     Thank you very much for doing this work! And thank you too for all the 
>> cool ideas as we've talked back and forth on these changes. My ideas are 
>> below:
>> 
>> publish-manifest:
>>     *Instead of using 9999... and 0 for infinity and a small value,
>>      talking with Drew Fisher here in BRM, we should be using:
>>      sys.maxint and -sys.maxint-1. I was originally not
>>      sure if this worked on Windows but it'll work on the Unix world
>>      and Windows. So, the hack should go away. (lines 204 & 205 could
>>      go away then)
> I have a few comments about this.
>
> 1)  Since we convert back and forth between integers and digit strings,I 
> don't think sys.maxint will be big enough. >>> import sys
>>>> print sys.maxint
> 2147483647
>
>>>> a=2147483647+1
>>>> print a
> 2147483648
>
> For example if we store an IP address of 255.255.255.255 (I know this is 
> unusual, but it is possible...) then the number from the digit string without 
> colons is way larger than maxint.
>
> 2) The maximum integral value is greater than maxint, and we could end up 
> with a stored value > maxint.
>
> The python reference at www.python.org says that even a long is "implemented 
> using long in C, which gives them at least 32 bits of precision".  To me that 
> sounds like some platforms may not support more than 32 bits.  That said, 
> comparing something with "999......999," which we are already doing, is 
> ultimately going to go over 32 bits too...
>
> I propose we replace INFINITY="999999....999" with
>
> INFINITY=str(0xFFFFFFFFFFFFFFFF)
>
> which is the digit string "18446744073709551615" or a 64 bit long
>
> which is more reasonable, and is still more than large enough to cover 
> 12-digit hex numbers.  It is still not perfect though, since python lets you 
> do the following:
>
>>>> a=str(0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF)
>>>> print  a
> 115792089237316195423570985008687907853269984665640564039457584007913129639935
>
> The real solution, which should be documented in 7876, is to store digit 
> strings in the database instead of integers, as we talked about Thursday in 
> our video call.  This not only handles the current state of affairs, but 
> plans ahead for if/when we implement storage of IPV6 addresses (with 24 (?) 
> hex digits).  I will be sure to update that bug report with this information.

I agree sys.maxint was not the right answer. It looks like Python 2.6 
introduced a sys.float_info.max (http://docs.python.org/library/sys.html) 
but we're still using 2.4.x. How about INFINITY=long(0xFFFFFFFFFFFFFFFF), 
as I'm not sure why we'd want the str object for numeric comparisons?

>>     *line 183 comment nit: "check [that] ranges are valid"
> OK.
>
> While doing this, I noticed comment references of "-inf".  I changed these 
> references to 0, since that's what the code used before I started working on 
> this bug, and what it still uses.  If you think it should be "-inf" instead 
> then I'll file a bug to fix this later.  Please let me know.

I think the comments(?) which had -inf in them are more accurate. As 
a negative number would still be okay. (Though we don't have a criteria at 
the moment where a negative number would make sense -- and I can't come up 
with such a case -- but the code currently works that way.)

>>     *In many of the checks for unbounded since forcing case is painful
>>      to the user, could you do a value.lower() or
>>      manifestCriteria.lower() before doing the compare?
> All "unbounded"s are already stored as lower case.  The conversion is done in 
> verifyXML.py, here at lines 250-254:
>
>                               # Handle "unbounded" keyword; and pass 
> lowercase
>                               lowered_value = one_value.lower()
>                               if (lowered_value == "unbounded"):
>                                       new_values += lowered_value

Ah, okay, I had seen that and not thought on it much further. Perhaps a 
comment referring to where the case processing is done would be good?

>> 
>> AI_database.py:
>>     *This is perhaps a nit since it's more of a stylistic and reuse
>>      idea for formatValue(), I'd like to see this take a dictionary of
>>      keys and values, so that it can handle all keys and values and
>>      only process the few it needs opposed to running for every key.
>>      I.e.:
>>      for key in ("mac", "ipv4", "ipv6"...):
>>         for prefix in ('', 'MIN', 'MAX'):
>>              try:
>>                 values[prefix+key]...
>>                 [...]
>>                 continue
>>              except: KeyError
>>                 pass
>>      Of course, however you want, as this pseudocode is still kludgy,
>>      but I'd like to see it be more efficient than just one key
>>      per call of the function. (Then, list-manifest.py line 174 could
>>      just have:
>>      criteria=formatValues(criteria.keys(),criteria) and
>>      webserver.py line 124 could just have:
>>      crit=formatValues(crit.keys(),crit), for example.)
> I tried this and ran into the problem of trying to change an immutable 
> value*.  This means I'd have to create a new dictionary with new values. 
> This can be done, but I'm thinking it is probably more efficient to leave the 
> code as is, since there aren't that many criteria to check, there are not 
> that many formatValue calls to make.  A few calls to formatValue seems more 
> efficient than recreating a dictionary.
>
> * dict[key]=newValue gave
> TypeError: object does not support item assignment
> and
> trying to do del dict[key] before reassignment, gave
> AttributeError: 'pysqlite2.dbapi2.Row' object has no attribute 'iteritems'
> when I tried to reassign

Ah, yes, they're all the PySQLite2 objects in that dictionary, so you'd 
have to delete the key and re-create it. I think you're right, let's move 
on.

>>     *Line 530: Elsewhere MIN or MAC are stripped with
>>      var.replace('MIN', '', 1) though I like the key = key[3:] perhaps
>>      consistency is better? (Feel free to ignore this if you think you
>>      should, since you have more experience than I do in maintaining big
>>      code bases.)
> OK.
>>     *Line 533: This padding will produce a padding like 0001722024199
>>      where as the padding necessary is by octet 172020024199 (for an
>>      address of 172.20.24.199). Also, I'm not sure why you're padding
>>      values here, since the string is padded in the database already.
> Oops.  Thanks.  Removed the padding from MAC addresses.

I forgot to point out the same for line 537 (latest line number) doing 
the same with IPv4.

>> 
>> Criteria_schema.rng:
>>     *Thank you for setting the style to a more consistent one with how
>>      other schemas have developed in the gate
>>     *Line 55,58 Nit: I find unless a block is really big, making it a
>>      name reference detracts from the schema readability from all the
>>      jumping around. Do you have a tool to expand or collapse them
>>      while reading?
> I don't have a tool and I admit that schema readability can be difficult with 
> many small blocks.  IMHO, I think it's worse without them though.  I also 
> think that taking a more "programmatic" approach to schemas (treating blocks 
> in the schema like functions are in a program) helps break the schema into 
> more manageable, more readable units.  The rules I code by apply well to 
> schemas.  Some of these are:
> 1) Lines are 80 chars
> 2) Nesting levels get indented by one tab and continuations by four spaces
> 3) When nesting gets so deep that code gets all scrunched up on the right, 
> it's time to revamp how that block of code is written, or else it's time to 
> create a function (which lets me then start on the left again).
> These rules fit me well for "coding" schemas too, and seem to make them more 
> manageable for me.
>>
>>     *Lines 54-63: We only support one SC manifest and we need one SC
>>      manifest with an AI manifest, let's take the zeroOrMore tag off
>>      and wrap the AI and SC manifest in an optional tag:
>>     <optional>
>>             <ref name="nm_ai_manifest_src"/>
>>             <ref name="nm_sc_manifest_src"/>
>>         </optional>
> Done.
>> 
>> verifyXML.py:
>>     *For __checkIPv4 and __checkMAC, it's too bad these don't return
>>      the same error one would get for a malformed MAC or IPv4 were
>>      they in a range.
> Yes, I agree.  But at least their errors are caught and processed in an 
> understandable way.
>> If we used LXML to parse the values they could
>>      by doing something like (for IPv4) -- this would also help when
>>      we need to support IPv6 since they're a lot more complex:
>>      # Load in the schema representation of an IPv4 address
>>      s = StringIO('''\
>>      <element name="a" xmlns="http://relaxng.org/ns/structure/1.0";>
>>      <data type="token">
>>      <param name="pattern">
>>      
>> ((25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])
>>      </param>
>>      </data>
>>      ''')
>>      # Parse schema
>>      ipv4_schema=etree.parse(s)
>>      relaxng = lxml.etree.RelaxNG(ipv4_schema)
>>      # from __checkIPv4, values is the IP address in question
>>      data=stringIO(values)
>>      try:
>>                  root = lxml.etree.parse(data)
>>          except lxml.etree.XMLSyntaxError, e:
>>                  return e.error_log.last_error
>>      [the you'd just need to do the rest of the splitting and padding]
>>      which could be as simple as (in fact this would be preferable
>>      to calling newval += ... even in the current form):
>>      return(''.join([octet.zfill(3) for octet in values.split(".")]))
> While this may be true, I'd rather skip this for now due to time constraints. 
> If you feel it is important enough, please file a bug.
>>
>>     *Line 161: Nit: Could you hand in criteriaRoot and the AI database
>>      to make the dependencies of this function more obvious when
>>      called?
> Done.
>>     *Lines 216-220: The schema should have line 146 changed to:
>>      <data type="string"/> from <text/> and then this is redundant and
>>      handled in XML validation
> I think you meant line 123 here, the value type of <element name="value">, 
> since the other text/ entries were for manifest names.
>
> When I change this, lines 216-220 still fire when I do this in the manifest:
>
>   <ai_criteria name="ipv4">
>       <value>
>               1.2.3.4 5.6.7.8
>       </value>
>
> so lines 216-220 have their use.  Leaving them in.
>
> I did a fast web search and you are right in that "string" does not do 
> whitespace normalization and "text" does.  However, it may be that 
> normalization strips whitespace from the ends of a string, not from the 
> middle, which is why I saw no effect in my test.

I had not thought about strip() splitting on a string with whitespace in 
it. I think we need to support such a string (i.e. if we support output 
from smbios(1) for a machine's manufacturer and model we'd have such 
values as "Sun Microsystems" and "Ultra 40" or "ThinkPad X200". Please 
remove or comment the check so that it can be identifed as allowing 
strings with no white space (please update bug 7955 too).

Again, the LXML parsing from above could be used for IP and MAC criteria 
and MEM could be validated against a <data type="integer"/> definition, 
etc.

>>     *Line 222-224: This is not necessary if a simple check is put in
>>      publish-manifest (original line numbers 116 and 223) to report a
>>      criteria is not a value criteria or range criteria, respectively.
> Part of the reason for this function is to do validation which is missed by 
> the schema, so the check seems appropriate here.

Okay

>>     *Line 225-228: In my testing with a range I can't get more than
>>      two lines to pass the XML validator, is this necessary?
> Strictly speaking you are correct.  I had put this in since I thought I 
> remembered criteria as being extensible and wanted this as a double check for 
> user errors in the criteria manifest.  However, thinking more, extending 
> criteria is difficult for the user to do, since there would need to be new 
> database fields added.  Since the user cannot do this, I'll remove lines 
> 225-228.
>>     *Line 229-232: This should be handled in the schema, as only a
>>      <range> or <value> tag are allowed?
> OK.  Removed.
>>
>>     Otherwise, everything looks reasonable. Let me know what you think of 
>> these observations, as there's a lot of ideas here and you have better 
>> experience than me which should be pursued and which are nits. Please call 
>> me with questions too.
> Thanks again for your review.  Thanks also for teaching me some new python 
> tricks, both during the code review and while studying your code during my 
> initial coding process.
>
>   Thanks,
>   Jack
>>
>>                             Thank you,
>>                             Clay
>> 
>> On Thu, 2 Apr 2009, Jack Schwartz wrote:
>> 
>>> Hi everyone.
>>> 
>>> Here is a webrev of my proposed fixes for:
>>>   4325 Better syntactic treatment of IP and MAC address AI criteria
>>> 
>>> A medium-level description of the changes is in the bug-report:
>>>   http://defect.opensolaris.org/bz/show_bug.cgi?id=4325
>>> 
>>> Webrev:
>>>   http://cr.opensolaris.org/~schwartz/090402.1/webrev/
>>> 
>>> Testing done:
>>> 
>>> Ran publish manifest and verified with webserver:
>>> - MAC: min/unbounded, IP: unbounded/max, network: min/unbounded,
>>>       mem: single <value> for range, minMAC has single digit btwn colons
>>> - range with three values: correctly failed validation
>>> - range with two unbounded values: correctly failed validation
>>> - range with 0 minimum: correctly passed validation
>>> - single IP addr given as a <range>: corectly failed validation
>>> - pair of values given as a <value>: correctly failed validation
>>> - 2 values for non range criterion, surrounded by <value>: correctly 
>>> failed validation
>>> - single value for non range criterion, surrounded by <value>: passed 
>>> validation
>>> - 2 values for non range criterion, surrounded by <range>: correctly 
>>> failed validation
>>> - Try a combination with a non-numeric value (arch=sparc): worked
>>> Also verified list-manifests displayed data correctly.
>>> 
>>> I'm requesting that Clay review, as he's most familiar with the code I've 
>>> changed and he has a heads-up that it's coming.  Others are invited to 
>>> review as well.
>>> 
>>> Tomorrow I'll verify that an install starts as it should, but I have 
>>> confidence it will work as the webserver output looks fine.
>>> 
>>> Sorry to rush.  I'm requesting a review ASAP, by COB tomorrow if possible. 
>>> I'm supposed to be done on Friday, but to give less than 1 day for this 
>>> review is not really realistic.
>>>
>>>   Thanks,
>>>   Jack
>>> 
>
>
>

Thank you again for hearing my suggestions on your work. Thank you also 
for all of the nifty ideas which your work has brought up.

                                                                -Clay

[caiman-discuss] Please review: 4325 Better syntactic treatment of IP and MAC address AI criteria

Reply via email to