[sqlalchemy] Re: SA 0.3.x: performance issues = proposal

2006-12-08 Thread Sébastien LELONG
As attachement, here's the patch (against rev 2132). It's local to 
sqlalchemy/orm/, so:

cd /sqlalchemy/orm
patch -p0  attribute_cache.patch


About the patch itself:
 1. the cache should be a WeakKeyDictionary
OK, done. Performances are still OK, differences between built-in dict are 
negligible.

 2. the cache needs to be cleared for a particular cache (class ?)
If I understand right, while calling register_attribute(), the cache will be 
cleared for the given class, if this one has been cached. Is there anywhere 
else the cache needs to cleared ?

 3. the cache should *probably* be at the module level, and not within
 the AttributeManager itself; SA uses only a single AttributeManager per
 application so it probably doesnt matter
I let the cache within the AttributeManager... Things can be moved at the 
module level, but if it's not necessary right now, it might not be a good 
idea to do it right now (when it will be necessary, the design may have been 
changed and modifications not valid anymore).

 4. the raise should use an exception class of some kind
I've made it raise a TypeError... Yes, string based exceptions are very a bad 
thing !


Attributes are cached while using the managed_attributes() *and* 
noninherited_managed_attributes() funcs. Those are very similar. I hesitated 
refactor them, but centralize cache management is probably a good thing. This 
is the purpose of the second patch, which include this refactoring. As 
expected, performances decrease. Here's the traditionnal benchmark results:

SA 0.3.1, rev 2127:
total time 4.29376721382

real0m5.420s
user0m4.088s
sys 0m0.108s

SA with attr. cache, no refactoring (~2X faster, back closed to 0.2.8 
speed):
total time 2.34819602966

real0m3.013s
user0m2.344s
sys 0m0.088s

SA with attr. cache, with refactoring (30% slower than without 
refactoring)
total time 3.05679416656

real0m3.747s
user0m2.988s
sys 0m0.068s


It's up to you choosing the patch ! IMO, I'm *not* in favor to use refactoring 
in this case :). 

Finally, I've put a clear_attribute_cache func which, ... clear the attribute 
cache. While client code may not have to worry about caching, it may need to 
clear it...


 thanks much, this is the kind of user involvement i like.
Well, you're very welcome ! Glad to help !


Cheers,

Seb
-- 
Sébastien LELONG
sebastien.lelong[at]sirloon.net


P.S:

 im beginning
 to suspect that yield introduces overhead vs. a straight list ?
 (python interpreter storing stack frames ?  dunno).
dunnotoo :)


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---
Index: attributes.py
===
--- attributes.py	(révision 2132)
+++ attributes.py	(copie de travail)
@@ -609,8 +609,18 @@
 
 class AttributeManager(object):
 allows the instrumentation of object attributes.  AttributeManager is stateless, but can be
-overridden by subclasses to redefine some of its factory operations.
+overridden by subclasses to redefine some of its factory operations. Also be aware AttributeManager
+will cache attributes for a given class, allowing not to determine those for each objects (used
+in managed_attributes() and noninherited_managed_attributes()). This cache is cleared for a given class
+while calling register_attribute(), and can be cleared using clear_attribute_cache()
 
+def __init__(self):
+# will cache attributes, indexed by class objects
+self._attribute_cache = weakref.WeakKeyDictionary()
+
+def clear_attribute_cache(self):
+self._attribute_cache.clear()
+
 def rollback(self, *obj):
 retrieves the committed history for each object in the given list, and rolls back the attributes
 each instance to their original value.
@@ -639,21 +649,31 @@
 
 def managed_attributes(self, class_):
 returns an iterator of all InstrumentedAttribute objects associated with the given class.
+if self._attribute_cache.has_key(class_):
+return self._attribute_cache[class_]
+
+self._attribute_cache[class_] = []   
 if not isinstance(class_, type):
-raise repr(class_) +  is not a type
+raise TypeError(repr(class_) +  is not a type)
 for key in dir(class_):
 value = getattr(class_, key, None)
 if 

[sqlalchemy] Re: SA 0.3.x: performance issues = proposal

2006-12-06 Thread Michael Bayer

i think that is a great idea !

can you make a patch for me ?

here are some things i think it should have:

1. the cache should be a WeakKeyDictionary, if possible, so that an
application which creates classes on the fly doesnt run out of memory.
this adds a micro-amount of overhead vs. a dictionary but i think its
negligible (compared to all the isintance calls and yields we are
losing)
2. the cache needs to be cleared for a particular cache whenever the
attribute manager is accessed to add or modify properties on the class
(i.e. the register_attribute method)
3. the cache should *probably* be at the module level, and not within
the AttributeManager itself; SA uses only a single AttributeManager per
application so it probably doesnt matter, but a class itself is going
to have the same properties regardless of which AttributeManager
accesses it (and also resets the cache)
4. the raise should use an exception class of some kind (TypeError?
exceptions.ArgumentError?)

as far as the yield, ive no problem getting rid of that.  im beginning
to suspect that yield introduces overhead vs. a straight list ?
(python interpreter storing stack frames ?  dunno).

thanks much, this is the kind of user involvement i like.


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: SA 0.3.x: performance issues

2006-12-05 Thread Sanjay

 instance changes get picked up.  if you want the more simplistic
 comparison method (i.e. using is), set up your PickleType with the
 flag mutable=False.

You meant like this?

Column('content', PickleType, mutable=False)

thanks
sanjay


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: SA 0.3.x: performance issues

2006-12-05 Thread Michael Bayer


Sanjay wrote:
  instance changes get picked up.  if you want the more simplistic
  comparison method (i.e. using is), set up your PickleType with the
  flag mutable=False.

 You meant like this?

 Column('content', PickleType, mutable=False)
 

like this

Column('content', PickleType(mutable=False))


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: SA 0.3.x: performance issues

2006-12-04 Thread Michael Bayer

hey good news, it was mostly something obvious.  I now have the the
overall time running within 10% of the 0.2.8 test, and the inner time
is within 20%.  It was another one of those debugger lines which I now
made conditional.

the other cause for the speed difference is related to the newer
functioning of the attributes package vs. 0.2.  the 0.3 series is
better able to accurately detect what objects have been changed by
using a CommittedState object that stores a snapshot of each instance
as it is loaded from the database.  this is in contrast to what
previous versions did whereby whenever a set or append operation
took place, the attribute instrumentation would store the old version
of the attribute at the point of the attribute changing (and the
management of list-based values was very complicated and brittle).  so
the older version moves the overhead from object instantiation to
object manipulation..which you can argue is better from a speed
perspective since only a subset of loaded objects will actually be
manipulated, if any.  Although, 0.3 is more efficient speedwise for the
creation and manipulation of objects that were not loaded from the
DB...so there is some tradeoff stuff happening here.

because of its construction, 0.2.8 had no way of detecting changes in
values that did not correspond to a set operation, such as mutable
values that are saved via pickling.  it was also more complicated code
and was not as flexible - in 0.3, the method of copying/comparing
attributes is determined by the corresponding TypeEngine instance
instead of being hardcoded (and is therefore customizable).  the
current architecture was modeled after the way hibernate does it.  we
also are in place to cleanly allow any collection type to be used for
collections which was not very flexible in 0.2.  so this an example of
code that is clearer, more straightforward, and more functional, but is
slightly slower in some cases (and a little faster in others).

anyway, current speed test in rev 2127:

0.3/rev 2127:

2006-12-04 15:25:37.203150: Loading sis
2006-12-04 15:25:38.619858: 377 sis loaded
2006-12-04 15:25:38.620749: Creating indicators
2006-12-04 15:25:38.785517: 127 indicators created
2006-12-04 15:25:38.786975: registering...
2006-12-04 15:25:38.877868: ok done !
total time 1.67651295662

real0m2.658s
user0m1.707s
sys 0m0.391s

0.2.8:

006-12-04 15:25:57.421278: Loading sis
2006-12-04 15:25:58.583894: 377 sis loaded
2006-12-04 15:25:58.584764: Creating indicators
2006-12-04 15:25:58.803848: 127 indicators created
2006-12-04 15:25:58.805388: registering...
2006-12-04 15:25:58.824950: ok done !
total time 1.40534806252

real0m2.255s
user0m1.457s
sys 0m0.379s

you can bet the 0.1 series was twice as fast as 0.2, too, since there
was an even greater level of hardwired behavior.


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: SA 0.3.x: performance issues

2006-12-02 Thread Michael Bayer

what logging really needs is this:

logging.debug(lambda: %s - %s % (_get_data(x),_get_other_data(y))

and the lambda: doesnt execute if logging is disabled; which accounts
for a lot more than just doing the % operation or not, such as in my
case where i make calls out to _instance_str() to represent an instance
string.   however, theres still overhead to creating those lambdas.

anyway, if you look at what I did, I just set a boolean flag in the
constructor based on is_debug_enabled() and just check the flag
explicitly before every call to debug()so no method calls or log
message construction of any kind happen at all if debug logging is
disabled.  speed looks a lot more like 0.2.8 now (pending further test
cases people want to give me).  (the next thing that might help is if i
speed up the calls to mapper_extension).

also, I challenge your assertion that saying x and y or z is a
perlish thing (its a C and Java thing if anything); python 2.5 has just
added the y if x else z syntax which is essentially an official
version of the same thing.


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: SA 0.3.x: performance issues

2006-12-02 Thread Daniel Miller


Michael Bayer wrote:
 also, I challenge your assertion that saying x and y or z is a
 perlish thing (its a C and Java thing if anything); python 2.5 has just
 added the y if x else z syntax which is essentially an official
 version of the same thing.
 

Well, I wasn't really talking about 'x and y or z'. I was actually referring to 
your HUGE incomprehensible one-liner...it wrapped to three lines in my editor.

However, the 'x and y or z' idiom is also discouraged because it is NOT the 
same thing as 'y if x else z'. If it was the same thing then they wouldn't have 
added that new syntax (which is really ugly IMO, but I digress) to 2.5. The 
reason they needed a new syntax is because the 'x and y or z' idiom fails if y 
evaluates to false. Example:

x = True
y = ''
z = 'else'

v = x and y or z

assert v == y # ERROR!



~ Daniel

--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: SA 0.3.x: performance issues

2006-12-02 Thread Michael Bayer

it is flawed for that reason yes.  fine for non-empty strings though.
the new syntax is meant to do correctly what we *want* the x and y or z
thing to do.


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: SA 0.3.x: performance issues (forgot attachments)

2006-12-01 Thread Michael Bayer

you need to forward the *actual test program* which you are running.  i
have tests for performance which generally show very minute speed
differences between 0.2 and 0.3.  The logging statements can be sped up
by making them conditional and maybe removing some.  but im really
curious to see what youre doing that is adding such a huge amount of
timeif you are inserting thousands of rows, thats something better
suited for straight SQL, for example.


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: SA 0.3.x: performance issues (forgot attachments)

2006-12-01 Thread Michael Bayer

made the same mod in EagerLoader in rev 2123, and added an eager load
profile.  its about 30% slower than the same test run in 0.2.8 to load
25000 objects eagerly.


--~--~-~--~~~---~--~~
 You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---