[sqlalchemy] Re: SA 0.3.x: performance issues = proposal
As attachement, here's the patch (against rev 2132). It's local to sqlalchemy/orm/, so: cd /sqlalchemy/orm patch -p0 attribute_cache.patch About the patch itself: 1. the cache should be a WeakKeyDictionary OK, done. Performances are still OK, differences between built-in dict are negligible. 2. the cache needs to be cleared for a particular cache (class ?) If I understand right, while calling register_attribute(), the cache will be cleared for the given class, if this one has been cached. Is there anywhere else the cache needs to cleared ? 3. the cache should *probably* be at the module level, and not within the AttributeManager itself; SA uses only a single AttributeManager per application so it probably doesnt matter I let the cache within the AttributeManager... Things can be moved at the module level, but if it's not necessary right now, it might not be a good idea to do it right now (when it will be necessary, the design may have been changed and modifications not valid anymore). 4. the raise should use an exception class of some kind I've made it raise a TypeError... Yes, string based exceptions are very a bad thing ! Attributes are cached while using the managed_attributes() *and* noninherited_managed_attributes() funcs. Those are very similar. I hesitated refactor them, but centralize cache management is probably a good thing. This is the purpose of the second patch, which include this refactoring. As expected, performances decrease. Here's the traditionnal benchmark results: SA 0.3.1, rev 2127: total time 4.29376721382 real0m5.420s user0m4.088s sys 0m0.108s SA with attr. cache, no refactoring (~2X faster, back closed to 0.2.8 speed): total time 2.34819602966 real0m3.013s user0m2.344s sys 0m0.088s SA with attr. cache, with refactoring (30% slower than without refactoring) total time 3.05679416656 real0m3.747s user0m2.988s sys 0m0.068s It's up to you choosing the patch ! IMO, I'm *not* in favor to use refactoring in this case :). Finally, I've put a clear_attribute_cache func which, ... clear the attribute cache. While client code may not have to worry about caching, it may need to clear it... thanks much, this is the kind of user involvement i like. Well, you're very welcome ! Glad to help ! Cheers, Seb -- Sébastien LELONG sebastien.lelong[at]sirloon.net P.S: im beginning to suspect that yield introduces overhead vs. a straight list ? (python interpreter storing stack frames ? dunno). dunnotoo :) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~--- Index: attributes.py === --- attributes.py (révision 2132) +++ attributes.py (copie de travail) @@ -609,8 +609,18 @@ class AttributeManager(object): allows the instrumentation of object attributes. AttributeManager is stateless, but can be -overridden by subclasses to redefine some of its factory operations. +overridden by subclasses to redefine some of its factory operations. Also be aware AttributeManager +will cache attributes for a given class, allowing not to determine those for each objects (used +in managed_attributes() and noninherited_managed_attributes()). This cache is cleared for a given class +while calling register_attribute(), and can be cleared using clear_attribute_cache() +def __init__(self): +# will cache attributes, indexed by class objects +self._attribute_cache = weakref.WeakKeyDictionary() + +def clear_attribute_cache(self): +self._attribute_cache.clear() + def rollback(self, *obj): retrieves the committed history for each object in the given list, and rolls back the attributes each instance to their original value. @@ -639,21 +649,31 @@ def managed_attributes(self, class_): returns an iterator of all InstrumentedAttribute objects associated with the given class. +if self._attribute_cache.has_key(class_): +return self._attribute_cache[class_] + +self._attribute_cache[class_] = [] if not isinstance(class_, type): -raise repr(class_) + is not a type +raise TypeError(repr(class_) + is not a type) for key in dir(class_): value = getattr(class_, key, None) if
[sqlalchemy] Re: SA 0.3.x: performance issues = proposal
i think that is a great idea ! can you make a patch for me ? here are some things i think it should have: 1. the cache should be a WeakKeyDictionary, if possible, so that an application which creates classes on the fly doesnt run out of memory. this adds a micro-amount of overhead vs. a dictionary but i think its negligible (compared to all the isintance calls and yields we are losing) 2. the cache needs to be cleared for a particular cache whenever the attribute manager is accessed to add or modify properties on the class (i.e. the register_attribute method) 3. the cache should *probably* be at the module level, and not within the AttributeManager itself; SA uses only a single AttributeManager per application so it probably doesnt matter, but a class itself is going to have the same properties regardless of which AttributeManager accesses it (and also resets the cache) 4. the raise should use an exception class of some kind (TypeError? exceptions.ArgumentError?) as far as the yield, ive no problem getting rid of that. im beginning to suspect that yield introduces overhead vs. a straight list ? (python interpreter storing stack frames ? dunno). thanks much, this is the kind of user involvement i like. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: SA 0.3.x: performance issues
instance changes get picked up. if you want the more simplistic comparison method (i.e. using is), set up your PickleType with the flag mutable=False. You meant like this? Column('content', PickleType, mutable=False) thanks sanjay --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: SA 0.3.x: performance issues
Sanjay wrote: instance changes get picked up. if you want the more simplistic comparison method (i.e. using is), set up your PickleType with the flag mutable=False. You meant like this? Column('content', PickleType, mutable=False) like this Column('content', PickleType(mutable=False)) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: SA 0.3.x: performance issues
hey good news, it was mostly something obvious. I now have the the overall time running within 10% of the 0.2.8 test, and the inner time is within 20%. It was another one of those debugger lines which I now made conditional. the other cause for the speed difference is related to the newer functioning of the attributes package vs. 0.2. the 0.3 series is better able to accurately detect what objects have been changed by using a CommittedState object that stores a snapshot of each instance as it is loaded from the database. this is in contrast to what previous versions did whereby whenever a set or append operation took place, the attribute instrumentation would store the old version of the attribute at the point of the attribute changing (and the management of list-based values was very complicated and brittle). so the older version moves the overhead from object instantiation to object manipulation..which you can argue is better from a speed perspective since only a subset of loaded objects will actually be manipulated, if any. Although, 0.3 is more efficient speedwise for the creation and manipulation of objects that were not loaded from the DB...so there is some tradeoff stuff happening here. because of its construction, 0.2.8 had no way of detecting changes in values that did not correspond to a set operation, such as mutable values that are saved via pickling. it was also more complicated code and was not as flexible - in 0.3, the method of copying/comparing attributes is determined by the corresponding TypeEngine instance instead of being hardcoded (and is therefore customizable). the current architecture was modeled after the way hibernate does it. we also are in place to cleanly allow any collection type to be used for collections which was not very flexible in 0.2. so this an example of code that is clearer, more straightforward, and more functional, but is slightly slower in some cases (and a little faster in others). anyway, current speed test in rev 2127: 0.3/rev 2127: 2006-12-04 15:25:37.203150: Loading sis 2006-12-04 15:25:38.619858: 377 sis loaded 2006-12-04 15:25:38.620749: Creating indicators 2006-12-04 15:25:38.785517: 127 indicators created 2006-12-04 15:25:38.786975: registering... 2006-12-04 15:25:38.877868: ok done ! total time 1.67651295662 real0m2.658s user0m1.707s sys 0m0.391s 0.2.8: 006-12-04 15:25:57.421278: Loading sis 2006-12-04 15:25:58.583894: 377 sis loaded 2006-12-04 15:25:58.584764: Creating indicators 2006-12-04 15:25:58.803848: 127 indicators created 2006-12-04 15:25:58.805388: registering... 2006-12-04 15:25:58.824950: ok done ! total time 1.40534806252 real0m2.255s user0m1.457s sys 0m0.379s you can bet the 0.1 series was twice as fast as 0.2, too, since there was an even greater level of hardwired behavior. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: SA 0.3.x: performance issues
what logging really needs is this: logging.debug(lambda: %s - %s % (_get_data(x),_get_other_data(y)) and the lambda: doesnt execute if logging is disabled; which accounts for a lot more than just doing the % operation or not, such as in my case where i make calls out to _instance_str() to represent an instance string. however, theres still overhead to creating those lambdas. anyway, if you look at what I did, I just set a boolean flag in the constructor based on is_debug_enabled() and just check the flag explicitly before every call to debug()so no method calls or log message construction of any kind happen at all if debug logging is disabled. speed looks a lot more like 0.2.8 now (pending further test cases people want to give me). (the next thing that might help is if i speed up the calls to mapper_extension). also, I challenge your assertion that saying x and y or z is a perlish thing (its a C and Java thing if anything); python 2.5 has just added the y if x else z syntax which is essentially an official version of the same thing. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: SA 0.3.x: performance issues
Michael Bayer wrote: also, I challenge your assertion that saying x and y or z is a perlish thing (its a C and Java thing if anything); python 2.5 has just added the y if x else z syntax which is essentially an official version of the same thing. Well, I wasn't really talking about 'x and y or z'. I was actually referring to your HUGE incomprehensible one-liner...it wrapped to three lines in my editor. However, the 'x and y or z' idiom is also discouraged because it is NOT the same thing as 'y if x else z'. If it was the same thing then they wouldn't have added that new syntax (which is really ugly IMO, but I digress) to 2.5. The reason they needed a new syntax is because the 'x and y or z' idiom fails if y evaluates to false. Example: x = True y = '' z = 'else' v = x and y or z assert v == y # ERROR! ~ Daniel --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: SA 0.3.x: performance issues
it is flawed for that reason yes. fine for non-empty strings though. the new syntax is meant to do correctly what we *want* the x and y or z thing to do. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: SA 0.3.x: performance issues (forgot attachments)
you need to forward the *actual test program* which you are running. i have tests for performance which generally show very minute speed differences between 0.2 and 0.3. The logging statements can be sped up by making them conditional and maybe removing some. but im really curious to see what youre doing that is adding such a huge amount of timeif you are inserting thousands of rows, thats something better suited for straight SQL, for example. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: SA 0.3.x: performance issues (forgot attachments)
made the same mod in EagerLoader in rev 2123, and added an eager load profile. its about 30% slower than the same test run in 0.2.8 to load 25000 objects eagerly. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---