about (ways of) query compilation: for example in dbcook i have query-filters expressed as plain python functions, and that adds 3 or 4 more levels additional to the usual query building. here the levels:
expr.translating: -a: py-func -> makeExpresion -> expr -b: expr -> walk( Translator( context) ) -> SA column expression context needs plain vars, and vars-classes/mappers -c: context binding: classes / mappers -d: context binding: plain vars -e: apply column-expression as select over a query -f: sql.bindparams way0: all everytime: a c, d,b,e,sql - no bindparams whatsoever way1: a c, store; then d,b,e,sql - no bindparams whatsoever way2: a b c, store; then d/bindparams, e,sql way3: a b c, e, store; then d/bindparams, sql way4: a b c, e, sql, store; then d/bindparams i guess similar breakdown can be done on the SA-query-building itself, with the most important division being query-into-statement, statement+bindparams+exec, instantiation. i admit it's a bit hard to build the model and app in a way that allows usage of bindparams in their real power, which is: the whole query statement is like a function and the bindparams are arguments to that function. note i'm not talking about the constants that SA automaticaly makes into bindparams, i'm talking about real parameters. i also must admit that i'm not too much sql-er and try to avoid it as much as possible; still, if speed means replacing a 1000 hierarchical per-node queries with one huge awfuly twisted query that gives those 1000 in one long shot, so be it (well but it takes.. weeks to invent). i guess if SA was say 10 times faster i would not bother doing it until i run it on some networked server and find that the reality is about 10,000 nodes and it didn't realy matter if SA was faster or slower - it's just a wrong way. not that i like overheads either... but in the long run i think it's correctness and consistency and re/usability that is important. What was running in 5 secs yesterday will run in 3 secs next year on same-price-equipment. it would be quite interesting if u try these, in various independent combinations, and compare the results: a) the query-into-statement "compilation", then binding+executing that 1000 times b) keep the instances cached c) have just _one_ query returning same thing that current 1000-queries do, in one long shot - and run that once. i guess combining a) with c) is meaningless, hence the useful combinations seems: nothing, a, a+b, c, c+b it may show interesting sides of the query execution but give them more than 1000, maybe 3000 (~15sec). ciao svilen On Friday 11 July 2008 12:19:26 Henk wrote: > Hi > > I did some profiling a simple select type query to check SA > performance for such use-cases > > The test consiste of a very simple Order record: > > order = Table('tblorder', metadata, > Column('order_id', Integer, primary_key = True), > Column('order_user_id', Integer), > Column('order_state', Integer), > Column('order_price', Integer), > Column('order_delivery_cost', Integer), > Column('order_creation_date', DateTime), > Column('order_modification_date', DateTime), > Column('order_last_payment_check', DateTime), > Column('order_delivery_address_id', Integer), > ) > > and some related log lines that keep track of what happens to the > order: > > orderlog = Table('tblorderlog', metadata, > Column('orderlog_id', Integer, primary_key = True), > Column('orderlog_order_id', Integer, > ForeignKey('tblorder.order_id')), > Column('orderlog_creation_date', DateTime), > Column('orderlog_message', String), > Column('orderlog_priority', Integer), > ) > > On the ORM side of things these 2 have a simple 1-1 mapping with > classes Order and OrderLog and > the loglines are mapped to the order using: > > 'orderLogs': relation(OrderLog, backref='order') > > The main loop of the test program is the following: > > times = 1000 > start = time.time() > for i in range(times): > order = Session.query(Order).get(72244) > > for log in order.orderLogs: > pass > > Session.clear() > > end = time.time() > > Note that it is the same Order that I am fetching all the time, > because I am interested in SA overhead and not in the speed of the > database (MySQL) itself, this way mysql will fetch from memory and > not be disk bound (also in this test I made sure I am not network > bound). Also note that the session is cleared each iteration trough > the loop to force SA to perform the 2 queries and corresponding > mapping overhead (1 qry for fetching order, 1 qry for fetching the > 17 corresponding orderlog lines) on each iteration. > > Profiling with cProfile (python 2.5.1) gave the following results > (sorted by total time spend in function): > > 2923724 function calls (2880671 primitive calls) in 5.965 > CPU seconds > > Ordered by: internal time > List reduced from 498 to 20 due to restriction <20> > > ncalls tottime percall cumtime percall > filename:lineno(function) > 2000 0.585 0.000 0.585 0.000 {method 'query' of > '_mysql.connection' objects} > 18000 0.192 0.000 1.715 0.000 mapper.py: > 1407(_instance) > 18000 0.160 0.000 0.801 0.000 mapper.py: > 1381(populate_state) > 2000 0.153 0.000 0.259 0.000 base.py: > 1448(_init_metadata) > 20000 0.153 0.000 0.198 0.000 times.py: > 43(DateTime_or_None) > 36000 0.147 0.000 0.147 0.000 identity.py: > 91(__contains__) > 16000 0.139 0.000 0.270 0.000 compiler.py: > 239(visit_column) > 112000 0.130 0.000 0.240 0.000 base.py: > 1363(__getitem__) > 94000 0.127 0.000 0.318 0.000 strategies.py: > 71(new_execute) > 18000 0.122 0.000 0.122 0.000 attributes.py: > 745(__init__) > 42000/2000 0.114 0.000 1.233 0.001 compiler.py: > 176(process) > 162036/162022 0.113 0.000 0.134 0.000 {getattr} > 14000 0.106 0.000 0.541 0.000 compiler.py: > 228(visit_label) > 2000 0.103 0.000 1.222 0.001 compiler.py: > 466(visit_select) > 112000 0.103 0.000 0.110 0.000 > base.py:1617(_get_col) 32000 0.089 0.000 0.158 0.000 > compiler.py: 401(_truncated_identifier) > 72000 0.086 0.000 0.096 0.000 util.py:806(add) > 135101 0.085 0.000 0.085 0.000 {isinstance} > 18000 0.073 0.000 0.181 0.000 identity.py:104(add) > 46000 0.068 0.000 0.080 0.000 {built-in method sub} > > What we see here is that of the total of 5.965 seconds, the test > spend only about 0.585 seconds performing the actual SQL querys, or > about 10% of the total time. The other 90% of the time we are > spending on SA overhead (except for the times.py line which is > somewhere in the MySQLdb driver). > > If I look at the cummulative output of the profiler (ommitted for > brevity) I see for instance that SA spends 1.3 seconds compiling > the SQL expressions (it apparently recompiles the same SQL > expression on each iteration). > > My question to the SA community would be how much ORM overhead do > you find acceptable? and how does SA compare to for instance to > java's Hibernate in this regard or other ORM solutions? > > I would love to help out trying to reduce this overhead, but I have > no idea what the SA devs have in mind for the future of SA with > regard to speed (e.g. speed vs. feature completeness). In the > tutorials and documentation there is currently a strong emphasis on > use cases highlighting the very dynamic nature of building complex > queries using SA's ORM tools. Not so much on how to perform simple > (and static) queries fast using the ORM. > > Maybe there is a way to meoize the results of the compile step so > that this does not need to be redone all the time while the Query > remains the same?. > > > > > > > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---