Re: [Freeipa-devel] DN patch and documentation

John Dennis Tue, 17 Jul 2012 15:48:42 -0700

On 07/10/2012 04:23 AM, Petr Viktorin wrote:

I've read your summary (which you should summarize into a commit message
before this is pushed), and gone through the patch.
Here is what I found doing that; I didn't get to actual testing yet.
I also didn't do a detailed review of ldap2.py changes yet.

Thank you for your careful review Petr, you had many good suggestionsand comments. It was an enormous amount of material to wade through,thank you for your diligence. I'm in agreement with most of yoursuggestions.

I agree with Simo that it would have been better to put at least your
automated changes in a separate patch. Without knowing what was
automated and what wasn't, I have to eyeball each change at least to
figure that out. Not complaining, it's a reviewer's work after all, and
with an intra-line diff tool it's not that hard, but I can't really
honor your suggestion for a faster review :)

I agree fully it would be nice if things were broken up into smallerwell defined patches. When I started the work I tried to keep thingsseparate but as I worked I discovered there were so many inter-dependentrelationships I was spending as much time juggling patch sets that couldbe applied independently as I was making forward progress to the realgoal. So I abandoned the idea of independent patch sets in favor ofgetting the work completed as expeditiously as possible, it was apragmatic trade-off. Now that the work is (nearly) completed I can seewhere some things can be broken out cleanly that was not obvious in themidst of the work. Hindsight is 20/20.

==== Blockers ====

Mutable objects that support __hash__ are a *very* bad thing to do. It
violates a fundamental assumption in Python, as you outlined. Mutable
objects *must not* be allowed to be dictionary keys.
I understand the patch is not perfect, but this is a big issue that
invites very hard-to-debug errors.
You say you're already working on a patch to fix this. Please post that
ASAP; I can't ack the DN work without it.

Understood, I agree with you and I will start work on those changesimmediately.

In the long run, it would be good to use immutable DNs only: I think
modeling them after Python strings in this way would be beneficial.

We edit DN's in a variety of places thus I feel mutable DN's have adistinct value.

If we only had immutable DN's then forming a new DN would require morecode that would extract the components from the right locations in theoriginal DN and use those extracted components to build a new DN. Thatwould not be too far from some of the earlier code which used a seriesof calls over multiple lines to slice and dice the strings. I believeusing just one simple obvious Python operator beats multiple lines ofconvoluted code every time. The expression of the concept is inherentlyobvious in the line of code using Python operator(s), plus it's vastlymore succinct (easier to read, easier to code, more robust).

Because the logic to build the new DN is inherently centralized in theobject operator it's a single implementation (much easier to maintainand prove correctness) rather than the cut-n-paste coping of code todecompose and reassemble an DN over many locations in the code base.

I believe mutable DN's have a valuable role to play and make the bestuse of Python's object oriented facilities. They go a long way to makingIPA more robust and easier to comprehend.

You seem to forged locked=True for some global DNs, e.g.
ipalib/plugins/pwpolicy.py:172: global_policy_dn
ipalib/plugins/migration.py:117: _compat_dn
Locked by default would prevent these omissions.



Please fix a lint failure; ipaserver.ipautil moved to ipapython.ipautil.



==== Design decision criticism ====

  > Originally I was against automatic type promotion on the belief if
you were comparing a string to a DN it was a failure of program logic,
but there were some circumstances where it was evil necessity.

Would you care to elaborate? I also think that's a logic failure; such
circumstances should at least be marked by a TODO.

I should have taken better notes because I don't remember the exactplace this came up, it was late in the testing and I believe the problemshowed up in the ldap updater, possibly some other places.

But here is the bottom line: I encountered a handful of places in thecode where a string was being compared to a DN object for equality (Iseem to recall the equality comparison was indirect by virtue of the"in" operator). What I do recall is it was logically impossible to knowa prori the string being tested was supposed to be a DN. If it werepossible to know the string represented a DN it would have been easy topromote it to a DN object. But there were a handful of situations wherethere simply was not enough information available to ascertain if thestring was logically a DN or not. This was never an issue when dn's weresimple strings. If it's getting compared to a DN that's a pretty stronghint it's supposed to be a DN (fwiw the coercion must occur in theequality operator).

So I pondered the problem for a while and decided there was legitimatevalue in allowing an equality comparison between a string and DN. Icould see no downside to it other than it *might* mask an error inprogram logic. I didn't take the decision lightly. I looked through thePython internals to see where type coercion occurs. While not often it'scertainly done where "it makes sense" (e.g. between string types,between numerical types, between buffer types, etc.). I thought onecould make the argument that a DN object is akin to a string type andhence falls into the same category of coercion as occurs between stringtypes. I'm not saying I'm in love with it, rather it's a pragmaticnecessity and can be justified if you look at it a certain way.

We could add a debug message with a backtrace whenever the coercionoccurs if we wanted to completely eliminate all places in the code wherea string-to-DN comparison occurs. But as I said I'm not sure it'spossible to absolutely eliminate this comparison. We've eliminated 99.9%of those cases already, I'm not sure the other 0.1% can be eliminated orthat it's worth the work involved. I'm happy with 99.9% :-)


  > The find() and rfind() methods (modeled after their string cousins)
to locate a RDN or DN in an existing DN.

I would much rather see the index() and rindex() methods, which do the
Pythonic thing of raising an exception rather than silently returning -1
when the thing isn't found.
Compare:
      something[something.find(thing)]  => something[-1] if not found
      something[something.index(thing)] => raises exception if not found

You're right exceptions are more Pythoninc than return codes. I can addindex and rindex and convert the code to catch an exception.

However, we use find and rfind extensively in the IPA code base(DN.find() and DN.rfind() simply replaced their string equivalentsduring the conversion). Perhaps you should open a ticket to purge alluse of find() and rfind() from our code base and add a prohibitionagainst their use in our Python coding standard document.



ipapython/dn.py:448:
      Each argument must be scalar convertable to unicode or a callable
object whose result is convertable to unicode.

Is it really necessary to allow callables here? If the user has a
callable, he/she should call it before making an AVA out of it. Same
with converting to Unicode.
As you point out in dn_summary, automatic coercion is evil.


There are two issues here, type coercion and string encoding.

Not all coercion is evil, there is value in making it easy to do theright thing and minimizing inline coercion that otherwise obfuscates thelogical intent of the code. We shouldn't require programmers to beperfect coders who never forget to perform some operation in thehundreds of places it might be required. Rather the tools should quietlyand consistently enforce the rules while leaving the code readable,concise and with clear intent.

There is a well established precedent in Python for coercion to stringin a string context. Coercion to string is so fundamental to Python ithas it's own low-level operator embedded in every class type definitionutilized in both Python and the CPython implementations.

1) We know all DN components must be strings. Let's say someone passesan integer object; is there really any harm to converting this to astring representation of an integer? That makes the DN API muchfriendlier and it's use more succinct. In fact Python does automaticcoercion to string in many places where it knows something must be astring, thus it's a well accepted and established practice. We doautomatic coercion to string in many places in IPA (a good example isthe ldap encode/decode operations).

I wish other Python API's also coerced to string as a service to thecaller. I wish python-ldap would have coerced to string those elementsof it's API which it knows must be strings. If it had we could have justpassed DN objects into the python-ldap API instead of having to wrap theclass to perform the coercion. This wouldn't be violating a guideline,rather it would be utilizing a inherent feature of the Python language.

There are an astonishing number of CPython modules that barf if you passthe wrong string type. Instead they should be friendly and provide theright string coercion. (There is a long and sordid history of thisproblem and I have a extensive write-up on it).

2) All strings in IPA are unicode. In python 3.x all strings will beunicode (the way it should have been in python 2.x alas). When stringcomparisons are performed they should be done as unicode objects, notencoded in some undefined encoding. Likewise when returning a stringbelonging to a component in a DN it should be unicode. Traditionallydn's have been str objects (a python-ldap foible), so the decoding ofstr to unicode promotes correctness in Python 2.x.

Accepting a callable is just a further refinement of the discussion in#1. If you allow for automatic coercion to sting (in a context thatdemands a string type) then why not allow anything which yields astring? It just makes the API friendlier and easier to use withoutintroducing ugly compromises. Some coercion is evil, but some is darnnice and useful. The trick is recognizing which is which.




==== Major issues ====

ipalib/plugins/aci.py:340:
+        groupdn = DN(groupdn)
+        try:
+            if groupdn[0].attr == 'cn':
+                dn = DN()
+                entry_attrs = {}
+                try:
+                    (dn, entry_attrs) = ldap.get_entry(groupdn, ['cn'])
+                except errors.NotFound, e:
+                    # FIXME, use real name here
+                    if test:
+                        dn = DN(('cn', 'test'),
api.env.container_permission)
+                        entry_attrs = {'cn': [u'test']}
+                if api.env.container_permission in dn:
+                    kw['permission'] = entry_attrs['cn'][0]
+                else:
+                    if 'cn' in entry_attrs:
+                        kw['group'] = entry_attrs['cn'][0]
+        except IndexError:
+            pass

Why is any IndexError ignored here?

Because the original test that entered the code block whose size you'reconcerned about was:


if groupdn.startswith('cn='):

The very first executable statement in the try block is

if groupdn[0].attr == 'cn'

If groupdn is actually empty it immediately exits the block becausethere will be an IndexError due to trying to access groupdn[0]. Thecombination of the try and the first if test mimics the original logicfor deciding if one should enter the block.

However, you make a valid point that there are other indexing statementsin the block that could raise an IndexError that should not be silentlycaught and ignored. I will change that.

I chose to use IndexError because error checking on DN "existencechecks" in the rest of the code most always uses IndexError andKeyError, I was trying to be consistent.


The test for entering the block can be changed to:

if len(groupdn) and groupdn[0].attr == 'cn':

and remove the try/except.

> That try block is much too long, it

could catch unintended errors. Please only `try` the smallest piece of
code that could raise the exception you want to catch.


see above

There are enough `DN(xyz.replace('ldap:///',''))` calls in aci.py to
warrant a strip_ldap_prefix helper. (It could also only remove the
prefix, not all the occurrences -- unlikely to matter but it's better to
do things correctly.)

No argument, I agree but I didn't write the code and I was trying not tochange code I didn't need to. Please open a ticket to get this fixed.

ipalib/plugins/baseldap.py:1028:
+            try:
+                if dn[0].attr != self.obj.primary_key.name:
+                    self.obj.handle_duplicate_entry(*keys)
+            except (IndexError, KeyError):
+                pass

Again, there's too much in the try block. You don't want to catch errors
from handle_duplicate_entry. Do this instead:
      try:
          dn_attr = dn[0].attr
      except (IndexError, KeyError):
          pass
      else:
          if dn_attr != self.obj.primary_key.name:
              self.obj.handle_duplicate_entry(*keys)


Good suggestion, I'll fix it.


==== Minor issues ====

In ipalib/plugins/aci.py:
@@ -343,10 +341,12 @@ def _aci_to_kw(ldap, a, test=False, pkey_only=False):
               else:
                   # See if the target is a group. If so we set the
                   # targetgroup attr, otherwise we consider it a subtree
-                if api.env.container_group in target:
-                    targetdn = unicode(target.replace('ldap:///',''))
-                    target = DN(targetdn)
-                    kw['targetgroup'] = target['cn']
+                targetdn = DN(target.replace('ldap:///',''))
+                if api.env.container_group in targetdn:
+                    try:
+                        kw['targetgroup'] = targetdn[0]['cn']
+                    except (IndexError, KeyError):
+                        pass
                   else:
                       kw['subtree'] = unicode(target)

Why is (IndexError, KeyError) ignored?

Because the original code did not seem to handle the error conditioneither and I wasn't sure what to do with the error. But you're right,silently ignoring those two exceptions is wrong. I'll fix it, good catchon your part.

Wouldn't be more correct to use endswith(DN(container_group, suffix))
instead of the `in` operator here, and in other places like this.

Good idea, the test you suggest is actually the correct test (shame onme for trying to translate the code too literally). I'll fix it.

ipapython/dn.py:26
      LOCKED_ERROR_MSG = 'Object is locked, it cannot be modified: %s'
and then:
      raise TypeError(LOCKED_ERROR_MSG % (repr(self)))

You should define a TypeError subclass for this.
Using a string instead of a specialized object is exactly what this
patch tries so hard to get rid of... :)

Moot point since locking will be replaced in favor of mutable andimmutable DN variants. Base Python exceptions would be raised insteadfor these types of coding errors, that's consistent with how we handlethose problems now.

ipapython/entity.py:49:
      elif isinstance(entrydata, DN):
          self.dn = entrydata
          self.data = ipautil.CIDict()
      elif isinstance(entrydata, basestring):
          self.dn = DN(entrydata)
          self.data = ipautil.CIDict()

Should we not use DNs exclusively here? At least a TODO is warranted.

It seemed reasonable to allow the constructor to coerce to DN ratherthan forcing the caller of the constructor to coerce a parameter.

Someday soon I hope the Entity and Entry classes go away as we convergeon a single API. In the interim this compromise seems reasonable andmeant there were fewer places to find and fix (everywhere where Entityand Entry objects were instantiated).


ipaserver/ipaldap.py:109:
      elif isinstance(entrydata,DN):
          self.dn = entrydata
          self.data = ipautil.CIDict()
      elif isinstance(entrydata,str) or isinstance(entrydata,unicode):
          self.dn = DN(entrydata)
          self.data = ipautil.CIDict()

Again, why not use DNs exclusively? At least put in a TODO.
For now at least do isinstance(entrydata, basestring) instead of the two
isinstances.

Agreed, there should be one isinstance(), I didn't code it, but I shouldhave fixed it when I saw it.



In several places:
+    # Use a property for <attr> to assure it's always a DN type
+    def _set_<attr>(self, value):
+        self._<attr> = DN(value)
+
+    def _get_<attr>(self):
+        assert isinstance(getattr(self, '_<attr>'), DN)
+        return self._<attr>
+
+    ldap_suffix = property(_get_<attr>, _set_<attr>)

Shouldn't the assert be in the setter? Do we not want people to only put
DNs in these properties, instead of silently converting?

I'm afraid I don't follow you here. When setting the property it shouldbe coerced to a DN, we want to force the property to always be a DN.

However it's possible for someone to set the _<attr> directly eventhough they should never do that, the assert in the getter is justbullet-proofing against that.

This appears enough times to justify using a helper to construct these.
But if you don't want to generalize, the getattr call is unnecessary.


Once again I don't follow you. How would a helper work?

I don't know why there is a getattr in the getter, it would seem to beunnecessary unless I was trying to protect against an uninitializedattribute. The getattr should probably be replaced with normal classattribute access. I'll fix that.

tests/util.py:307:
      if isinstance(expected, DN):
          if isinstance(got, basestring):
              got = DN(got)

This may be overly broad, the tests should be able to check if they got
a DN or a string. What about something like:

      if isinstance(expected, DNString):
          assert isinstance(got, basestring)
          assert DN(got) == expected.dn
with a suitable DNString class?

Sorry I don't understand what you're trying to protect against nor whata DNString class would be or what it's purpose is.

ipapython/dn.py:541: def _set_locked(self, value):
There should be only one way to do things, either the lock/unlock
methods, or setting an attribute. More ways to do one same thing lead to
subtle errors in one of the ways.

The locking code is duplicated in AVA, RDN and DN; it would be good to
make a Lockable mixin to avoid that. A similar case is with __repr__.

Anyway unlocking is bad (see above) so these are mostly moot points.

A mixin would eliminate code duplication but in this case theduplication is trivial, not sure it warrants introducing a new class inthis circumstance (judgment call), but the topic is moot because lockingis going to be eliminated.


ipaserver/plugins/ldap2.py:134:
      class ServerSchema(object):

Please don't use nested classes without reason.

There is a valid reason, the ServerSchema class is private to theSchemaCache class.

==== Nitpicks/opinions ====

In ipa-replica-manage:369
  >    sys.exit("winsync agreement already exists on subtree %s" %
please remove the trailing space



ipapython/dn.py:
      in docstring:  DN(arg0, ..., locked=False, first_key_match=True)
      followed by:  def __init__(self, *args, **kwds):
      and:  kwds.get('first_key_match', True)

I don't see the reason for this. Just write `def __init__(self, *args,
locked=False, first_key_match=True)` and put a proper summary in the
first line of the docstring. Same in AVA & RDN.


A valid point. I think I was trying to be too flexible/extensible.

ipapython/ipautil.py:201:
-    """Convert a kerberos realm into the IPA suffix."""

Why are you removing a docstring?


Must have been a mistake. I'll fix.

ipaserver/plugins/ldap2.py:79:
      _debug_log_ldap = True
      if _debug_log_ldap:
          import pprint

Just import it unconditionally if you need it.

Not to worry, I'm going to nuke the pprint formatting anyway, it neverworked correctly and when it did it didn't add much value.

ipalib/plugins/baseldap.py:1874:
- sortfn=lambda x,y: cmp(x[1][self.obj.primary_key.name][0].lower(),
y[1][self.obj.primary_key.name][0].lower())
+    sortfn=lambda x,y: cmp(x[1][self.obj.primary_key.name][0],
+        y[1][self.obj.primary_key.name][0])
      entries.sort(sortfn)

Since you're touching this: it's better (easier, faster, more readable)
to use a key function. (Also, defining a named function is better with
`def foo` than `foo=lambda...`):
      def sortkey(x):
          return x[1][self.obj.primary_key.name][0]
      entries.sort(key=sortkey)

Agreed, it would have been better to rewrite this using a key functionrather than preserving the old code, I'll fix.



ipapython/entity.py:95:
      # Python "internal" class attributes are prefixed and suffixed
        with double underscores.
      # Make sure we continue to return things like __str__ from
        this class.

I think a better solution would be making Entity a new-style class.
Or even better, killing Entity off altogether. Which is the plan, so
this is a moot point.
Same with Entry in ipaserver/ipaldap.py.

Yeah, that would have worked too, probably cleaner, but I really wantthese classes to go away, not much point in tweaking them now.

ipaserver/install/ldapupdate.py:132:
+        assert isinstance(suffix, (None.__class__, DN))

type(x) is preferable to x.__class__ (the same way e.g. len(x) is
preferable to x.__len__). However, I think it would be more readable to use:
      if suffix is not None:
          assert isinstance(suffix, DN)

I don't think you can use type() to cover the case when something mightbe a derived subclass, hence the way it was coded (I anticipated DNmight be subclassed). I was trying to allow for both None or any DN orsubclass of DN in one statement. Perhaps not the best idea.


Your suggestion is more readable, I'll change it to match your suggestion.

ipaserver/plugins/ldap2.py:109:
def value_to_utf8(val):
      '''
      If val is not a string we need to convert it to a string
      (specifically a unicode string). Naively we might think we need to
      call str(val) to convert to a string. This is incorrect because if
      val is already a unicode object then str() will call
      encode(default_encoding) returning a str object encoded with
      default_encoding. But we don't want to apply the default_encoding!
      Rather we want to guarantee the val object has been converted to a
      unicode string because from a unicode string we want to explicitly
      encode to a str using our desired encoding (utf-8 in this case).

      Note: calling unicode on a unicode object simply returns the exact
      same object (with it's ref count incremented). This means calling
      unicode on a unicode object is effectively a no-op, thus it's not
      inefficient.
      '''
That is not an explanation of *what* the function does, but *how* it
does it. Thus it should be in a comment, not in a docstring.

Also, if something needs such an explanation, it also need unit tests to
ensure it actually works correctly.

IPASimpleLDAPObject has the same issue -- lengthy descriptions of design
decisions should go in a comment (if not in a commit message or on the
mailing list).



Valid point, I'll make them comments.

  > Require that every place a dn is used it must be a DN object. This
  > translates into lot of `assert isinstance(dn, DN)` sprinkled through
  > out the code.

That seems like a crutch to get your DN work done. We should remove
these once the dust settles.

I disagree with removing the asserts. You're correct they were aninvaluable aid in performing the conversion but they have a value beyondthat. I fully expect programmers to make mistakes with DN usage in thefuture. Such mistakes are really easy to make especially when for yearswe just used strings. I've already seen examples of dn string formattinggetting committed to master when folks should have known better. As suchI believe the asserts will continue to play a valuable role in the(foreseeable?) future.

We do use asserts in a fair number of other locations. I don't presumeyou mean you want all assert statements removed.

Instead I think we should make sure the Python interpreter is startedwith the -O option in production mode, then assertions will be strippedout and not evaluated eliminating the overhead.


see http://wiki.python.org/moin/UsingAssertionsEffectively

  > Python provides the __all__ module export list, we really should be
  > making better use of that.

I recommend importing names explicitly, or importing a module and using
qualified names, over __all__ and import *. Star imports make it
unnecessarily hard to see where names come from.
(You do use explicit imports in the code, my gripe is just about
recommending * and __all__).

We should come to a project consensus on this topic and add it to ourPython style guide. This really isn't germane to the dn work.

  > I equate Python decorators with C preprocessor hell when CPP macros
  > are abused.

I do not share that view. It is possible to abuse decorators, but not
all of their use is necessarily evil.
It's true that encode.py wasn't a good use of them.

Agreed, decorators are not always bad, they have a role to play. I wasperhaps being too dramatic in my wording and not selective enough. Onlysome decorators are like C preprocessor hell, some are perfectly fine.

It took me so long to unravel and fully comprehend what theencode/decode decorators were doing in detail that I was probablyventing my frustration with that experience.

  > Removed all usage of get_schema() which was being called prior to
  > accessing the .schema attribute of an object. If a class is using
  > internal lazy loading as an optimization it's not right to require
  > users of the interface to be aware of internal optimization's.
  > Instead the class should handle it's lazy loading completely
  > internally by loading the data on first access.
  > This is very easy to do with class properties. schema is now a
  > property of the class instead of an attribute. When the schema
  > property is accessed it actually calls a private internal method to
  > perform the lazy loading.

In other words, you're reinventing the reify decorator:
https://github.com/Pylons/pyramid/blob/master/pyramid/decorator.py#L1

Hmm... I wasn't aware of reify. A quick look at it and it smacks ofself-modifying code, more hidden weirdness that's not obvious in staticcode reading. Personally I vastly prefer things to be obvious and nottricky, clever, obscure, and hidden.


Being a grammar nazi, I must point out the it's/its confusion and
plurals formed with apostrophes.

Yup, can't disagree, but heck I pumped out 10 pages of documentation ina little over a day, there was some grammar road-kill along the way nodoubt.

BTW, as a non-native English speaker it's admirable to pick these thingsup. I'm impressed!

====

Thanks for the work! This will really make IPA better when it's polished
a bit. Hopefully all the criticism doesn't sound too negative :)

Thank you for the review. It was a lot of material to slog though andyour careful attention is appreciated, it will definitely make itbetter. You made good points and I was in agreement with the majority ofthem. Thank you.



--
John Dennis <jden...@redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/


_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Re: [Freeipa-devel] DN patch and documentation

Reply via email to