On 07/10/2012 04:23 AM, Petr Viktorin wrote:
I've read your summary (which you should summarize into a commit message
before this is pushed), and gone through the patch.
Here is what I found doing that; I didn't get to actual testing yet.
I also didn't do a detailed review of ldap2.py changes yet.

Thank you for your careful review Petr, you had many good suggestions and comments. It was an enormous amount of material to wade through, thank you for your diligence. I'm in agreement with most of your suggestions.

I agree with Simo that it would have been better to put at least your
automated changes in a separate patch. Without knowing what was
automated and what wasn't, I have to eyeball each change at least to
figure that out. Not complaining, it's a reviewer's work after all, and
with an intra-line diff tool it's not that hard, but I can't really
honor your suggestion for a faster review :)

I agree fully it would be nice if things were broken up into smaller well defined patches. When I started the work I tried to keep things separate but as I worked I discovered there were so many inter-dependent relationships I was spending as much time juggling patch sets that could be applied independently as I was making forward progress to the real goal. So I abandoned the idea of independent patch sets in favor of getting the work completed as expeditiously as possible, it was a pragmatic trade-off. Now that the work is (nearly) completed I can see where some things can be broken out cleanly that was not obvious in the midst of the work. Hindsight is 20/20.


==== Blockers ====

Mutable objects that support __hash__ are a *very* bad thing to do. It
violates a fundamental assumption in Python, as you outlined. Mutable
objects *must not* be allowed to be dictionary keys.
I understand the patch is not perfect, but this is a big issue that
invites very hard-to-debug errors.
You say you're already working on a patch to fix this. Please post that
ASAP; I can't ack the DN work without it.

Understood, I agree with you and I will start work on those changes immediately.

In the long run, it would be good to use immutable DNs only: I think
modeling them after Python strings in this way would be beneficial.

We edit DN's in a variety of places thus I feel mutable DN's have a distinct value.

If we only had immutable DN's then forming a new DN would require more code that would extract the components from the right locations in the original DN and use those extracted components to build a new DN. That would not be too far from some of the earlier code which used a series of calls over multiple lines to slice and dice the strings. I believe using just one simple obvious Python operator beats multiple lines of convoluted code every time. The expression of the concept is inherently obvious in the line of code using Python operator(s), plus it's vastly more succinct (easier to read, easier to code, more robust).

Because the logic to build the new DN is inherently centralized in the object operator it's a single implementation (much easier to maintain and prove correctness) rather than the cut-n-paste coping of code to decompose and reassemble an DN over many locations in the code base.

I believe mutable DN's have a valuable role to play and make the best use of Python's object oriented facilities. They go a long way to making IPA more robust and easier to comprehend.

You seem to forged locked=True for some global DNs, e.g.
ipalib/plugins/pwpolicy.py:172: global_policy_dn
ipalib/plugins/migration.py:117: _compat_dn
Locked by default would prevent these omissions.



Please fix a lint failure; ipaserver.ipautil moved to ipapython.ipautil.



==== Design decision criticism ====

  > Originally I was against automatic type promotion on the belief if
you were comparing a string to a DN it was a failure of program logic,
but there were some circumstances where it was evil necessity.

Would you care to elaborate? I also think that's a logic failure; such
circumstances should at least be marked by a TODO.

I should have taken better notes because I don't remember the exact place this came up, it was late in the testing and I believe the problem showed up in the ldap updater, possibly some other places.

But here is the bottom line: I encountered a handful of places in the code where a string was being compared to a DN object for equality (I seem to recall the equality comparison was indirect by virtue of the "in" operator). What I do recall is it was logically impossible to know a prori the string being tested was supposed to be a DN. If it were possible to know the string represented a DN it would have been easy to promote it to a DN object. But there were a handful of situations where there simply was not enough information available to ascertain if the string was logically a DN or not. This was never an issue when dn's were simple strings. If it's getting compared to a DN that's a pretty strong hint it's supposed to be a DN (fwiw the coercion must occur in the equality operator).

So I pondered the problem for a while and decided there was legitimate value in allowing an equality comparison between a string and DN. I could see no downside to it other than it *might* mask an error in program logic. I didn't take the decision lightly. I looked through the Python internals to see where type coercion occurs. While not often it's certainly done where "it makes sense" (e.g. between string types, between numerical types, between buffer types, etc.). I thought one could make the argument that a DN object is akin to a string type and hence falls into the same category of coercion as occurs between string types. I'm not saying I'm in love with it, rather it's a pragmatic necessity and can be justified if you look at it a certain way.

We could add a debug message with a backtrace whenever the coercion occurs if we wanted to completely eliminate all places in the code where a string-to-DN comparison occurs. But as I said I'm not sure it's possible to absolutely eliminate this comparison. We've eliminated 99.9% of those cases already, I'm not sure the other 0.1% can be eliminated or that it's worth the work involved. I'm happy with 99.9% :-)

  > The find() and rfind() methods (modeled after their string cousins)
to locate a RDN or DN in an existing DN.

I would much rather see the index() and rindex() methods, which do the
Pythonic thing of raising an exception rather than silently returning -1
when the thing isn't found.
Compare:
      something[something.find(thing)]  => something[-1] if not found
      something[something.index(thing)] => raises exception if not found

You're right exceptions are more Pythoninc than return codes. I can add index and rindex and convert the code to catch an exception.

However, we use find and rfind extensively in the IPA code base (DN.find() and DN.rfind() simply replaced their string equivalents during the conversion). Perhaps you should open a ticket to purge all use of find() and rfind() from our code base and add a prohibition against their use in our Python coding standard document.



ipapython/dn.py:448:
      Each argument must be scalar convertable to unicode or a callable
object whose result is convertable to unicode.

Is it really necessary to allow callables here? If the user has a
callable, he/she should call it before making an AVA out of it. Same
with converting to Unicode.
As you point out in dn_summary, automatic coercion is evil.

There are two issues here, type coercion and string encoding.

Not all coercion is evil, there is value in making it easy to do the right thing and minimizing inline coercion that otherwise obfuscates the logical intent of the code. We shouldn't require programmers to be perfect coders who never forget to perform some operation in the hundreds of places it might be required. Rather the tools should quietly and consistently enforce the rules while leaving the code readable, concise and with clear intent.

There is a well established precedent in Python for coercion to string in a string context. Coercion to string is so fundamental to Python it has it's own low-level operator embedded in every class type definition utilized in both Python and the CPython implementations.


1) We know all DN components must be strings. Let's say someone passes an integer object; is there really any harm to converting this to a string representation of an integer? That makes the DN API much friendlier and it's use more succinct. In fact Python does automatic coercion to string in many places where it knows something must be a string, thus it's a well accepted and established practice. We do automatic coercion to string in many places in IPA (a good example is the ldap encode/decode operations).

I wish other Python API's also coerced to string as a service to the caller. I wish python-ldap would have coerced to string those elements of it's API which it knows must be strings. If it had we could have just passed DN objects into the python-ldap API instead of having to wrap the class to perform the coercion. This wouldn't be violating a guideline, rather it would be utilizing a inherent feature of the Python language.

There are an astonishing number of CPython modules that barf if you pass the wrong string type. Instead they should be friendly and provide the right string coercion. (There is a long and sordid history of this problem and I have a extensive write-up on it).

2) All strings in IPA are unicode. In python 3.x all strings will be unicode (the way it should have been in python 2.x alas). When string comparisons are performed they should be done as unicode objects, not encoded in some undefined encoding. Likewise when returning a string belonging to a component in a DN it should be unicode. Traditionally dn's have been str objects (a python-ldap foible), so the decoding of str to unicode promotes correctness in Python 2.x.

Accepting a callable is just a further refinement of the discussion in #1. If you allow for automatic coercion to sting (in a context that demands a string type) then why not allow anything which yields a string? It just makes the API friendlier and easier to use without introducing ugly compromises. Some coercion is evil, but some is darn nice and useful. The trick is recognizing which is which.




==== Major issues ====

ipalib/plugins/aci.py:340:
+        groupdn = DN(groupdn)
+        try:
+            if groupdn[0].attr == 'cn':
+                dn = DN()
+                entry_attrs = {}
+                try:
+                    (dn, entry_attrs) = ldap.get_entry(groupdn, ['cn'])
+                except errors.NotFound, e:
+                    # FIXME, use real name here
+                    if test:
+                        dn = DN(('cn', 'test'),
api.env.container_permission)
+                        entry_attrs = {'cn': [u'test']}
+                if api.env.container_permission in dn:
+                    kw['permission'] = entry_attrs['cn'][0]
+                else:
+                    if 'cn' in entry_attrs:
+                        kw['group'] = entry_attrs['cn'][0]
+        except IndexError:
+            pass

Why is any IndexError ignored here?

Because the original test that entered the code block whose size you're concerned about was:

if groupdn.startswith('cn='):

The very first executable statement in the try block is

if groupdn[0].attr == 'cn'

If groupdn is actually empty it immediately exits the block because there will be an IndexError due to trying to access groupdn[0]. The combination of the try and the first if test mimics the original logic for deciding if one should enter the block.

However, you make a valid point that there are other indexing statements in the block that could raise an IndexError that should not be silently caught and ignored. I will change that.

I chose to use IndexError because error checking on DN "existence checks" in the rest of the code most always uses IndexError and KeyError, I was trying to be consistent.

The test for entering the block can be changed to:

if len(groupdn) and groupdn[0].attr == 'cn':

and remove the try/except.

> That try block is much too long, it
could catch unintended errors. Please only `try` the smallest piece of
code that could raise the exception you want to catch.

see above

There are enough `DN(xyz.replace('ldap:///',''))` calls in aci.py to
warrant a strip_ldap_prefix helper. (It could also only remove the
prefix, not all the occurrences -- unlikely to matter but it's better to
do things correctly.)

No argument, I agree but I didn't write the code and I was trying not to change code I didn't need to. Please open a ticket to get this fixed.

ipalib/plugins/baseldap.py:1028:
+            try:
+                if dn[0].attr != self.obj.primary_key.name:
+                    self.obj.handle_duplicate_entry(*keys)
+            except (IndexError, KeyError):
+                pass

Again, there's too much in the try block. You don't want to catch errors
from handle_duplicate_entry. Do this instead:
      try:
          dn_attr = dn[0].attr
      except (IndexError, KeyError):
          pass
      else:
          if dn_attr != self.obj.primary_key.name:
              self.obj.handle_duplicate_entry(*keys)

Good suggestion, I'll fix it.


==== Minor issues ====

In ipalib/plugins/aci.py:
@@ -343,10 +341,12 @@ def _aci_to_kw(ldap, a, test=False, pkey_only=False):
               else:
                   # See if the target is a group. If so we set the
                   # targetgroup attr, otherwise we consider it a subtree
-                if api.env.container_group in target:
-                    targetdn = unicode(target.replace('ldap:///',''))
-                    target = DN(targetdn)
-                    kw['targetgroup'] = target['cn']
+                targetdn = DN(target.replace('ldap:///',''))
+                if api.env.container_group in targetdn:
+                    try:
+                        kw['targetgroup'] = targetdn[0]['cn']
+                    except (IndexError, KeyError):
+                        pass
                   else:
                       kw['subtree'] = unicode(target)

Why is (IndexError, KeyError) ignored?

Because the original code did not seem to handle the error condition either and I wasn't sure what to do with the error. But you're right, silently ignoring those two exceptions is wrong. I'll fix it, good catch on your part.

Wouldn't be more correct to use endswith(DN(container_group, suffix))
instead of the `in` operator here, and in other places like this.

Good idea, the test you suggest is actually the correct test (shame on me for trying to translate the code too literally). I'll fix it.

ipapython/dn.py:26
      LOCKED_ERROR_MSG = 'Object is locked, it cannot be modified: %s'
and then:
      raise TypeError(LOCKED_ERROR_MSG % (repr(self)))

You should define a TypeError subclass for this.
Using a string instead of a specialized object is exactly what this
patch tries so hard to get rid of... :)

Moot point since locking will be replaced in favor of mutable and immutable DN variants. Base Python exceptions would be raised instead for these types of coding errors, that's consistent with how we handle those problems now.

ipapython/entity.py:49:
      elif isinstance(entrydata, DN):
          self.dn = entrydata
          self.data = ipautil.CIDict()
      elif isinstance(entrydata, basestring):
          self.dn = DN(entrydata)
          self.data = ipautil.CIDict()

Should we not use DNs exclusively here? At least a TODO is warranted.

It seemed reasonable to allow the constructor to coerce to DN rather than forcing the caller of the constructor to coerce a parameter.

Someday soon I hope the Entity and Entry classes go away as we converge on a single API. In the interim this compromise seems reasonable and meant there were fewer places to find and fix (everywhere where Entity and Entry objects were instantiated).


ipaserver/ipaldap.py:109:
      elif isinstance(entrydata,DN):
          self.dn = entrydata
          self.data = ipautil.CIDict()
      elif isinstance(entrydata,str) or isinstance(entrydata,unicode):
          self.dn = DN(entrydata)
          self.data = ipautil.CIDict()

Again, why not use DNs exclusively? At least put in a TODO.
For now at least do isinstance(entrydata, basestring) instead of the two
isinstances.

Agreed, there should be one isinstance(), I didn't code it, but I should have fixed it when I saw it.



In several places:
+    # Use a property for <attr> to assure it's always a DN type
+    def _set_<attr>(self, value):
+        self._<attr> = DN(value)
+
+    def _get_<attr>(self):
+        assert isinstance(getattr(self, '_<attr>'), DN)
+        return self._<attr>
+
+    ldap_suffix = property(_get_<attr>, _set_<attr>)

Shouldn't the assert be in the setter? Do we not want people to only put
DNs in these properties, instead of silently converting?

I'm afraid I don't follow you here. When setting the property it should be coerced to a DN, we want to force the property to always be a DN.

However it's possible for someone to set the _<attr> directly even though they should never do that, the assert in the getter is just bullet-proofing against that.

This appears enough times to justify using a helper to construct these.
But if you don't want to generalize, the getattr call is unnecessary.

Once again I don't follow you. How would a helper work?

I don't know why there is a getattr in the getter, it would seem to be unnecessary unless I was trying to protect against an uninitialized attribute. The getattr should probably be replaced with normal class attribute access. I'll fix that.

tests/util.py:307:
      if isinstance(expected, DN):
          if isinstance(got, basestring):
              got = DN(got)

This may be overly broad, the tests should be able to check if they got
a DN or a string. What about something like:

      if isinstance(expected, DNString):
          assert isinstance(got, basestring)
          assert DN(got) == expected.dn
with a suitable DNString class?

Sorry I don't understand what you're trying to protect against nor what a DNString class would be or what it's purpose is.

ipapython/dn.py:541: def _set_locked(self, value):
There should be only one way to do things, either the lock/unlock
methods, or setting an attribute. More ways to do one same thing lead to
subtle errors in one of the ways.

The locking code is duplicated in AVA, RDN and DN; it would be good to
make a Lockable mixin to avoid that. A similar case is with __repr__.

Anyway unlocking is bad (see above) so these are mostly moot points.

A mixin would eliminate code duplication but in this case the duplication is trivial, not sure it warrants introducing a new class in this circumstance (judgment call), but the topic is moot because locking is going to be eliminated.


ipaserver/plugins/ldap2.py:134:
      class ServerSchema(object):

Please don't use nested classes without reason.

There is a valid reason, the ServerSchema class is private to the SchemaCache class.

==== Nitpicks/opinions ====

In ipa-replica-manage:369
  >    sys.exit("winsync agreement already exists on subtree %s" %
please remove the trailing space



ipapython/dn.py:
      in docstring:  DN(arg0, ..., locked=False, first_key_match=True)
      followed by:  def __init__(self, *args, **kwds):
      and:  kwds.get('first_key_match', True)

I don't see the reason for this. Just write `def __init__(self, *args,
locked=False, first_key_match=True)` and put a proper summary in the
first line of the docstring. Same in AVA & RDN.

A valid point. I think I was trying to be too flexible/extensible.

ipapython/ipautil.py:201:
-    """Convert a kerberos realm into the IPA suffix."""

Why are you removing a docstring?

Must have been a mistake. I'll fix.

ipaserver/plugins/ldap2.py:79:
      _debug_log_ldap = True
      if _debug_log_ldap:
          import pprint

Just import it unconditionally if you need it.

Not to worry, I'm going to nuke the pprint formatting anyway, it never worked correctly and when it did it didn't add much value.

ipalib/plugins/baseldap.py:1874:
- sortfn=lambda x,y: cmp(x[1][self.obj.primary_key.name][0].lower(),
y[1][self.obj.primary_key.name][0].lower())
+    sortfn=lambda x,y: cmp(x[1][self.obj.primary_key.name][0],
+        y[1][self.obj.primary_key.name][0])
      entries.sort(sortfn)

Since you're touching this: it's better (easier, faster, more readable)
to use a key function. (Also, defining a named function is better with
`def foo` than `foo=lambda...`):
      def sortkey(x):
          return x[1][self.obj.primary_key.name][0]
      entries.sort(key=sortkey)

Agreed, it would have been better to rewrite this using a key function rather than preserving the old code, I'll fix.


ipapython/entity.py:95:
      # Python "internal" class attributes are prefixed and suffixed
        with double underscores.
      # Make sure we continue to return things like __str__ from
        this class.

I think a better solution would be making Entity a new-style class.
Or even better, killing Entity off altogether. Which is the plan, so
this is a moot point.
Same with Entry in ipaserver/ipaldap.py.

Yeah, that would have worked too, probably cleaner, but I really want these classes to go away, not much point in tweaking them now.

ipaserver/install/ldapupdate.py:132:
+        assert isinstance(suffix, (None.__class__, DN))

type(x) is preferable to x.__class__ (the same way e.g. len(x) is
preferable to x.__len__). However, I think it would be more readable to use:
      if suffix is not None:
          assert isinstance(suffix, DN)

I don't think you can use type() to cover the case when something might be a derived subclass, hence the way it was coded (I anticipated DN might be subclassed). I was trying to allow for both None or any DN or subclass of DN in one statement. Perhaps not the best idea.

Your suggestion is more readable, I'll change it to match your suggestion.

ipaserver/plugins/ldap2.py:109:
def value_to_utf8(val):
      '''
      If val is not a string we need to convert it to a string
      (specifically a unicode string). Naively we might think we need to
      call str(val) to convert to a string. This is incorrect because if
      val is already a unicode object then str() will call
      encode(default_encoding) returning a str object encoded with
      default_encoding. But we don't want to apply the default_encoding!
      Rather we want to guarantee the val object has been converted to a
      unicode string because from a unicode string we want to explicitly
      encode to a str using our desired encoding (utf-8 in this case).

      Note: calling unicode on a unicode object simply returns the exact
      same object (with it's ref count incremented). This means calling
      unicode on a unicode object is effectively a no-op, thus it's not
      inefficient.
      '''
That is not an explanation of *what* the function does, but *how* it
does it. Thus it should be in a comment, not in a docstring.

Also, if something needs such an explanation, it also need unit tests to
ensure it actually works correctly.

IPASimpleLDAPObject has the same issue -- lengthy descriptions of design
decisions should go in a comment (if not in a commit message or on the
mailing list).


Valid point, I'll make them comments.


  > Require that every place a dn is used it must be a DN object. This
  > translates into lot of `assert isinstance(dn, DN)` sprinkled through
  > out the code.

That seems like a crutch to get your DN work done. We should remove
these once the dust settles.

I disagree with removing the asserts. You're correct they were an invaluable aid in performing the conversion but they have a value beyond that. I fully expect programmers to make mistakes with DN usage in the future. Such mistakes are really easy to make especially when for years we just used strings. I've already seen examples of dn string formatting getting committed to master when folks should have known better. As such I believe the asserts will continue to play a valuable role in the (foreseeable?) future.

We do use asserts in a fair number of other locations. I don't presume you mean you want all assert statements removed.

Instead I think we should make sure the Python interpreter is started with the -O option in production mode, then assertions will be stripped out and not evaluated eliminating the overhead.

see http://wiki.python.org/moin/UsingAssertionsEffectively

  > Python provides the __all__ module export list, we really should be
  > making better use of that.

I recommend importing names explicitly, or importing a module and using
qualified names, over __all__ and import *. Star imports make it
unnecessarily hard to see where names come from.
(You do use explicit imports in the code, my gripe is just about
recommending * and __all__).

We should come to a project consensus on this topic and add it to our Python style guide. This really isn't germane to the dn work.

  > I equate Python decorators with C preprocessor hell when CPP macros
  > are abused.

I do not share that view. It is possible to abuse decorators, but not
all of their use is necessarily evil.
It's true that encode.py wasn't a good use of them.

Agreed, decorators are not always bad, they have a role to play. I was perhaps being too dramatic in my wording and not selective enough. Only some decorators are like C preprocessor hell, some are perfectly fine.

It took me so long to unravel and fully comprehend what the encode/decode decorators were doing in detail that I was probably venting my frustration with that experience.


  > Removed all usage of get_schema() which was being called prior to
  > accessing the .schema attribute of an object. If a class is using
  > internal lazy loading as an optimization it's not right to require
  > users of the interface to be aware of internal optimization's.
  > Instead the class should handle it's lazy loading completely
  > internally by loading the data on first access.
  > This is very easy to do with class properties. schema is now a
  > property of the class instead of an attribute. When the schema
  > property is accessed it actually calls a private internal method to
  > perform the lazy loading.

In other words, you're reinventing the reify decorator:
https://github.com/Pylons/pyramid/blob/master/pyramid/decorator.py#L1

Hmm... I wasn't aware of reify. A quick look at it and it smacks of self-modifying code, more hidden weirdness that's not obvious in static code reading. Personally I vastly prefer things to be obvious and not tricky, clever, obscure, and hidden.


Being a grammar nazi, I must point out the it's/its confusion and
plurals formed with apostrophes.

Yup, can't disagree, but heck I pumped out 10 pages of documentation in a little over a day, there was some grammar road-kill along the way no doubt.

BTW, as a non-native English speaker it's admirable to pick these things up. I'm impressed!

====

Thanks for the work! This will really make IPA better when it's polished
a bit. Hopefully all the criticism doesn't sound too negative :)

Thank you for the review. It was a lot of material to slog though and your careful attention is appreciated, it will definitely make it better. You made good points and I was in agreement with the majority of them. Thank you.


--
John Dennis <jden...@redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/


_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to