from:"Claudio Freire"

Re: [sqlalchemy] how fast can pure sqlalchemy perform?

2015-08-10 Thread Claudio Freire

On Fri, Aug 7, 2015 at 6:58 PM, kk krm...@gmail.com wrote:
 On Friday 07 August 2015 10:05 PM, Claudio Freire wrote:

 On Fri, Aug 7, 2015 at 12:51 PM, Mike Bayer mike...@zzzcomputing.com
 wrote:

 On 8/7/15 11:05 AM, kk wrote:



 On Friday 07 August 2015 03:03 PM, Ladislav Lenart wrote:

 Hello.

 ORM is certainly slower. How much depends A LOT on your workload. For
 example
 bulk operations with ORM are an order of magnitude slower than raw SQL.
 On the
 other hand, SQLAlchemy Core let's you write generative SQL queries
 without ORM
 features which are as performant as raw SQL.


 So is is there some kind of a method to have some prepared sql
 statements
 in SQLAlchemy itself?
 I have seen that prepared statements in other languages like Java do a
 great job.


 That's totally a myth and you can see me ranting on this whole topic of
 explicit prepared statements == SPEED here:
 https://mail.python.org/pipermail/db-sig/2014-December/006147.html See
 the
 benchmark there.  Whatever performance we get with prepared statements
 is
 vanishingly small and utterly dwarfed by the order-of-magnitude-greater
 latencies we get from Python. The DBAPI already has a great speed
 optimization in this area and it is known as executemany() - it applies
 only
 to CRUD statements, not SELECT, but SQLAlchemy uses executemany() very
 heavily and to great effect - the speed gains here are not so much due to
 prepared statements, as psycopg2 does not use them in any way, but due to
 the fact that we roll up lots of data into a single call that psycopg2
 can
 run from pure compiled C code.

 It may be a myth most of the time, but there are cases where it is not.

 I had one case (in a whole decade of programming, so it is indeed very
 rare) in which a very complex query ran very fast, and planning time
 was the dominant cost (think 150ms for planning and 15ms for
 execution). For that query, preparing it explicitly saved a lot of
 runtime. Again, executemany helps when it's implemented with prepared
 statements. But I was using psycopg2 and it doesn't use prepared
 statements, so I had to prepare them explicitly myself. This was with
 SQLAlchemy 0.3 and I managed to do it just fine. Some black magic was
 needed of course (had to explicitly compile the query to SQL, bind
 parameters, and generate a PREPARE statement from it), but nothing too
 complex.

 Now, pg8000 does use prepared statements, so it may be as simple as
 using that driver if the need for prepared statements is there. You
 can even have two engines pointing to the same database and use pg8000
 only for the queries that really need prepared statements. So
 SQLAlchemy (and DBAPI) has come a long way since I found that case I
 mentioned above. I'm sure today handling that case would have been
 even easier.


 Thanks for the details reply.
 Now I understand it better.
 So you mean to say executemany will give me the needed performance gain in a
 nutshell.
 Is that curect?

In a nutshell, yes.

 Secondly, is executemany good at only Insert, or Update or both?

Both. More precisely, anything that doesn't produce results.

 And lastly if I have a big resultset through a select statement, more so
 from a view, what is the best approach to use if I decide not to use stored
 procedures?

Well, the bigger your result sets are, the less you care about store procedures.

Assuming store procedures speed up queries (which is a big IF that is
very often false), they only speed up the planning phase, not the
execution. There's no execution optimization you cannot accomplish
with raw SQL, so the bigger the result set, the less you care about
planning time, and thus the less the relative benefit from using
stored procedures is.

Honestly, the benefits of store procedures is so small, and their
maintainance cost so high, that I would suggest never using them
unless you find a case you've thoroughly analyzed and profiled, and
that you find they'd be a huge help (which won't happen for a big
while).

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] how fast can pure sqlalchemy perform?

2015-08-10 Thread Claudio Freire

On Mon, Aug 10, 2015 at 6:42 PM, kk krm...@gmail.com wrote:
 Hello,
 Tahnks to you and Mike for detaild insight, My questions follow in-line.On
 Friday 07 August 2015 08:48 PM, Claudio Freire wrote:

 On Fri, Aug 7, 2015 at 12:05 PM, kk krm...@gmail.com wrote:

 On Friday 07 August 2015 03:03 PM, Ladislav Lenart wrote:

 Hello.

 ORM is certainly slower. How much depends A LOT on your workload. For
 example
 bulk operations with ORM are an order of magnitude slower than raw SQL.
 On
 the
 other hand, SQLAlchemy Core let's you write generative SQL queries
 without
 ORM features which are as performant as raw SQL.


 I am going to be mostly with Postgresql for now so I don't wish database
 independent queries.  Perhaps we will shift to NoSql in near future (more on
 that in some time ).
 So if I were to write core queries then I could as well do directly with
 psycopg2.  What advantage I will then get by using SQLAlchemy?

For one benefit, building complex queries programatically is much
easier with Core than with SQL strings, and less error-prone.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] how fast can pure sqlalchemy perform?

2015-08-07 Thread Claudio Freire

On Fri, Aug 7, 2015 at 12:05 PM, kk krm...@gmail.com wrote:
 On Friday 07 August 2015 03:03 PM, Ladislav Lenart wrote:

 Hello.

 ORM is certainly slower. How much depends A LOT on your workload. For
 example
 bulk operations with ORM are an order of magnitude slower than raw SQL. On
 the
 other hand, SQLAlchemy Core let's you write generative SQL queries without
 ORM
 features which are as performant as raw SQL.


 So is is there some kind of a method to have some prepared sql statements in
 SQLAlchemy itself?
 I have seen that prepared statements in other languages like Java do a great
 job.

Depending on the dialect and driver, you can use executemany.
Not all drivers implement executemany with prepared statements though,
check your case.

 Overall SQLAlchemy is an excellent
 library to work with!


 So you mean performance will really get hit when pure ORM is used.
 So shold I use a mixture?
 For inserts let's say orm and for bulk select queries some prepared
 statement like thing (if it exists )?

It really depends on each use case.

If your concern is whether ORM queries will be efficient, don't worry,
SQLAlchemy is powerful enough that you can make almost any kind of
query with the ORM. Almost all the optimizations you could do to plain
SQL are doable at the ORM level.

If your concern is CPU overhead on the application side, yes, the ORM
does induce quite an overhead, but whether it's a problem or not
greatly depends on your use case, the number of objects your
transactions will be handling, the complexity of the mapping, your
latency and thoughput constraints, etc. I've convinced myself over
time that a little overhead is fine in exchange for the benefits the
ORM gives you, in ease of coding mostly, but also robustness (the ORM
solves some issues that are hard to handle correctly and robustly with
raw SQL), and SQLAlchemy is flexible enough that you can usually
escape to raw sql if/when you need to. You shouldn't optimize
prematurely, the ORM won't be a death trap as it happens with other
ORMs.

So, we're saying we need more information if we're to give a meaningful answer.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] how fast can pure sqlalchemy perform?

2015-08-07 Thread Claudio Freire

On Fri, Aug 7, 2015 at 12:51 PM, Mike Bayer mike...@zzzcomputing.com wrote:
 On 8/7/15 11:05 AM, kk wrote:



 On Friday 07 August 2015 03:03 PM, Ladislav Lenart wrote:

 Hello.

 ORM is certainly slower. How much depends A LOT on your workload. For
 example
 bulk operations with ORM are an order of magnitude slower than raw SQL.
 On the
 other hand, SQLAlchemy Core let's you write generative SQL queries
 without ORM
 features which are as performant as raw SQL.


 So is is there some kind of a method to have some prepared sql statements
 in SQLAlchemy itself?
 I have seen that prepared statements in other languages like Java do a
 great job.


 That's totally a myth and you can see me ranting on this whole topic of
 explicit prepared statements == SPEED here:
 https://mail.python.org/pipermail/db-sig/2014-December/006147.html See the
 benchmark there.  Whatever performance we get with prepared statements is
 vanishingly small and utterly dwarfed by the order-of-magnitude-greater
 latencies we get from Python. The DBAPI already has a great speed
 optimization in this area and it is known as executemany() - it applies only
 to CRUD statements, not SELECT, but SQLAlchemy uses executemany() very
 heavily and to great effect - the speed gains here are not so much due to
 prepared statements, as psycopg2 does not use them in any way, but due to
 the fact that we roll up lots of data into a single call that psycopg2 can
 run from pure compiled C code.


It may be a myth most of the time, but there are cases where it is not.

I had one case (in a whole decade of programming, so it is indeed very
rare) in which a very complex query ran very fast, and planning time
was the dominant cost (think 150ms for planning and 15ms for
execution). For that query, preparing it explicitly saved a lot of
runtime. Again, executemany helps when it's implemented with prepared
statements. But I was using psycopg2 and it doesn't use prepared
statements, so I had to prepare them explicitly myself. This was with
SQLAlchemy 0.3 and I managed to do it just fine. Some black magic was
needed of course (had to explicitly compile the query to SQL, bind
parameters, and generate a PREPARE statement from it), but nothing too
complex.

Now, pg8000 does use prepared statements, so it may be as simple as
using that driver if the need for prepared statements is there. You
can even have two engines pointing to the same database and use pg8000
only for the queries that really need prepared statements. So
SQLAlchemy (and DBAPI) has come a long way since I found that case I
mentioned above. I'm sure today handling that case would have been
even easier.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Handling big Python objects

2014-12-03 Thread Claudio Freire

On Wed, Dec 3, 2014 at 7:23 PM, Andrea Gavana andrea.gav...@gmail.com wrote:
 3. As an aside, not zlib-ing the files saves about 5 seconds/simulation
 (over a 20 seconds save) but increases the database size by 4 times. I'll
 have to check if this is OK.

You can usually specify a compression level of 1 to get a considerable
speedup if you care about speed more than disk efficiency.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] objects not necessarily pulled from session's identity map when they should (?)

2014-11-04 Thread Claudio Freire

On Tue, Nov 4, 2014 at 3:15 PM, Jonathan Vanasco jvana...@gmail.com wrote:

 I have a permissions check routine that repeatedly queries for a certain
 Foo2Bar table

 class Foo2Bar(Base):
 __tablename__ = 'foo_2_bar'
 id_foo = Column(Integer, ForeignKey(foo.id), primary_key=True)
 id_bar = Column(Integer, ForeignKey(bar.id), primary_key=True)

 print Get Foo2Bar() 4x
 print id(s.query(Foo2Bar).get((1,2)))
 print id(s.query(Foo2Bar).get((1,2)))
 print id(s.query(Foo2Bar).get((1,2)))
 print id(s.query(Foo2Bar).get((1,2)))

 When I do this on a test harness, it works exactly like it should.  It only
 hits the database once.

 When I do this in my app, it hits the database (postgres) unreliably.

...

 does anyone have a clue what could cause this behavior?

The code as shown, would likely exhibit that behavior with a weak
identity map. It would indeed be hard to predict when it happened
since weakrefs are cleared on garbage collection cycles, and those
happen at hard to predict times.

You could rule that out, by storing the result of get() in some
variable somewhere for the duration of the test/routine. That should
keep it in the identity map long enough to serve your purposes.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] statement.compile and legacy is not null code

2014-10-01 Thread Claudio Freire

On Wed, Oct 1, 2014 at 6:51 PM, Jonathan Vanasco jvana...@gmail.com wrote:

   # 0.7.9 and later
   sq = sess.query(A.id.label('id')).filter(A.id != None)


I've used ~(id == None) since 0.3 quite successfully (though I've
eventually switched to != None since it's way more natural so not sure
if there's any place where it doesn't compile well)

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Closing pool connections

2014-09-25 Thread Claudio Freire

On Thu, Sep 25, 2014 at 5:33 AM, Thierry Florac tflo...@gmail.com wrote:
 Hi,

 I have a multi-threaded web application using SQLAlchemy connections pool.
 As soon as a connection is opened, it's maintained opened in the pool even
 if it's not used anymore. On a long run, this can consume too much
 unnecessary connections and database server resources.

 So what I'm actually looking for is a way to close a given connection which
 is returned to the pool if it wasn't used for a given amount of time...

 Any idea ?


A while back when I wanted this, I ended up subclassing QueuePool and
using that subclass with the engine.

Sadly, I don't have the code at hand. But I have managed without it
since then, so I'm wondering if this is really an issue. Currently,
I'm using postgres with pgbouncer, and this last one can handle
separating the idle from the active very well.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Dangers of setting a session's expire_on_commit=False?

2014-07-25 Thread Claudio Freire

On Fri, Jul 25, 2014 at 7:55 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 Don't objects potentially have this problem anyway, in the sense that if
 they are accessed TWICE after a commit, the second access will use the data
 cached from the first, and could again be out of date?


 only if you have a low transaction isolation level set up.   The Session
 tries to make a choice as to the most reasonable place that concurrent
 changes should be anticipated.  Transaction demarcations are the best place.
 If you are expecting to code your app to specifically expect READ COMMITTED
 or READ UNCOMMITTED behavior where your transaction relies upon seeing a row
 change value from a concurrent transaction, that’s a special use case, in
 which case you can use expire() for those object that have this requirement.
 The ORM Session can obviously not guess when such an expiration is to be
 detected otherwise.


I don't see how transaction isolation levels relate to this.

The effect of disabling expire_on_commit is that of not seeing
subsequent commits. It would be a fictious DO NOT READ COMMITTED
level. Having it on, somewhat caters to possible SERIALIZED settings,
where strict ordering is to be expected, since without serialized
transactions there's no way expiring helps correctness in any way.
None of those seem overly common to me, so I don't see how one can
ignore the serialization level in effect or possible concurrent
updates that are happening at the same time, with or without
expire_on_commit.

IMHO, expire_on_commit is something that really has no sensible
default. You pick your own, the library authors pick one default
because, well, why not?

For the record, I'm using expire_on_commit=off, because I also use
model instances outside the scope of their originating transaction.
I've had no problems with it, but I did have to be very careful with
the semantics and lifetime of those objects, and of expiring manually
anything I put on concurrently-accessed structures, lest someone
modify it before the session's scope is over and it gets (or tries to
be) committed to the DB.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Dangers of setting a session's expire_on_commit=False?

2014-07-25 Thread Claudio Freire

On Fri, Jul 25, 2014 at 10:09 PM, Michael Bayer
mike...@zzzcomputing.com wrote:
 For the record, I'm using expire_on_commit=off, because I also use
 model instances outside the scope of their originating transaction.
 I've had no problems with it, but I did have to be very careful with
 the semantics and lifetime of those objects, and of expiring manually
 anything I put on concurrently-accessed structures, lest someone
 modify it before the session's scope is over and it gets (or tries to
 be) committed to the DB.

 Maybe expire on begin might be useful.   So your data is still there after 
 commit, but if you start a new transaction, then things refresh.   I'm up for 
 it in a 1.0 release, if you think it's useful.   Though explaining to 
 everyone another option...what a PITA

I don't see how that'd work.

On session.add? When set as dependent of an object of another (active) session?

Seems very error prone to make it too automatic.

All in all, the current state isn't bad. It's understandable, and controllable.

I'm just saying there's no one-size-fits-all.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Mapping lots of similar tables / database design

2014-04-01 Thread Claudio Freire

On Tue, Apr 1, 2014 at 8:07 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 Think about it.  You have 2000 individual Python classes all called
 TimeSeries, all referring to a different table.  This is the question you
 have to answer (and which I think is going to wind you up back at one
 table):

 1. which one of those 2000 TimeSeries classes do I refer to when I say
 Location.timeseries ?

 2. what SQL would a query like query(Location).join(timeseries) produce?
 Are you looking for an enormous UNION of 2000 tables?  that's not going to
 work.

 Think about this in terms of SQL.   If you want to query across *all*
 timeseries at once, having them all in 2000 tables is not going to be
 possible.  If you did have that setup, the approach would be to write a
 map/reduce query that hits each TimeSeries table separately and figures it
 out at that level, just like if you were running a noSQL database like
 CouchDB or something like that.


Thing is, this is not relational data.

I believe you need a columnar data store.

Which, I guess it's up to you to choose - I don't know many opensource ones.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] filter vs get question

2014-02-20 Thread Claudio Freire

On Thu, Feb 20, 2014 at 5:22 PM, Jonathan Vanasco jonat...@findmeon.com wrote:
 this seems to work, but I want to just make sure this is the intended
 behavior.

a = dbSession.query( Something ).filter( Something.primary_key == 1
 ).first()
b = dbSession.query( Something ).get( 1 )
c = dbSession.query( Something ).get( 1 )
d = dbSession.query( Something ).get( 1 )
e = dbSession.query( Something ).get( 1 )

 in the above example , we will only hit the database once , because the
 'filter' populates the local session map with the primary key.  right ?


Only as long as you keep a reference to the object returned by the
first query, since the identity map is a weak map.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Sharding without the ORM

2014-02-19 Thread Claudio Freire

On Wed, Feb 19, 2014 at 1:03 PM, Silviu Dicu silviud...@gmail.com wrote:
 Hi,

 I have a legacy application that doesn't use the ORM but I need to implement
 horizantal sharding on mysql.
 Did anybody has done something similar or have any ideas.

You mean besides manually computing a consistent hash and binding the
query to one engine or another?
(which btw isn't hard at all)

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

[sqlalchemy] Cannot use pgbouncer's statement mode

2014-02-17 Thread Claudio Freire

Hello there. I'm using SQLAlchemy 0.7.10 with postgres+pgbouncer, and
I'm trying to move from a transaction-mode pool towards a
statement-mode pool.

The reason for this move, is that I've got quite a high volume of
reads that cannot be quickly serviced in transaction mode, since many
connections are left idling in a transaction for perhaps up to 1
second. A combination of network roundtrips and GIL-related delays
conspire to get me into this position.

So, I've been refactoring my application so that I can use two kinds
of sessions, read-write normal transactional ones, and read-only ones
that use autocommit and a different pgbouncer configured for statement
pooling.

Problem I run into, is that even in autocommit mode, SQLAlchemy itself
issues multi-statement transactions, like this:

 s = get_slave_session(autocommit=True)
 c = s.connection()
Traceback (most recent call last):
  File stdin, line 1, in module
  File /opt/jampp/analytics/matlogbrowser/model/base.py, line 80, in
connection
return sqlalchemy.orm.Session.connection(self, *p, **kw)
  File build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py, line
784, in connection
  File build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py, line
788, in _connection_for_bind
  File build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py, line
313, in _connection_for_bind
  File build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py, line
2489, in contextual_connect
  File build/bdist.linux-x86_64/egg/sqlalchemy/pool.py, line 236, in connect
  File build/bdist.linux-x86_64/egg/sqlalchemy/pool.py, line 401, in __init__
  File build/bdist.linux-x86_64/egg/sqlalchemy/pool.py, line 746, in _do_get
  File build/bdist.linux-x86_64/egg/sqlalchemy/pool.py, line 189, in
_create_connection
  File build/bdist.linux-x86_64/egg/sqlalchemy/pool.py, line 287, in __init__
  File build/bdist.linux-x86_64/egg/sqlalchemy/event.py, line 380,
in exec_once
  File build/bdist.linux-x86_64/egg/sqlalchemy/event.py, line 389, in __call__
  File build/bdist.linux-x86_64/egg/sqlalchemy/engine/strategies.py,
line 167, in first_connect
  File build/bdist.linux-x86_64/egg/sqlalchemy/dialects/postgresql/base.py,
line 1000, in initialize
  File build/bdist.linux-x86_64/egg/sqlalchemy/engine/default.py,
line 171, in initialize
  File build/bdist.linux-x86_64/egg/sqlalchemy/dialects/postgresql/base.py,
line 1184, in _get_server_version_info
  File build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py, line
1449, in execute
  File build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py, line
1628, in _execute_text
  File build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py, line
1698, in _execute_context
  File build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py, line
1691, in _execute_context
  File build/bdist.linux-x86_64/egg/sqlalchemy/engine/default.py,
line 331, in do_execute
sqlalchemy.exc.DatabaseError: (DatabaseError) Long transactions not allowed
ERROR:  Long transactions not allowed
 'select version()' {}


Here, get_slave_session returns a subclass of Session that will
override connection() to return a connection that has
execution_options(autocommit=True) applied (because for pgbouncer,
even the implicit transaction for selects is a transaction). The
session itself is also built with a autocommit=True.

Now, it seems the session is using a connection that is indepepndent
of what connection() returns. What do I have to override to get the
desired behavior? (that is: one query per transaction no matter which
type of query).

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Cannot use pgbouncer's statement mode

2014-02-17 Thread Claudio Freire

On Mon, Feb 17, 2014 at 9:31 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 the Python DBAPI as you know returns a connection that's in a transaction. So 
 if you are using an environment that has zero tolerance for even two 
 statements in one transaction, you should turn on autocommit at the 
 psycopg2 level, using the connection.autocommit flag.   you can achieve 
 this with an on-connect event, with some tricks currently needed to 
 absolutely force it as the first thing that happens (I need to either 
 document this or make a nicer API, but this works for now), like this:

 from sqlalchemy import event, create_engine

 eng = create_engine(postgresql+psycopg2://scott:tiger@localhost/test, 
 echo=True)

 @event.listens_for(eng, first_connect, insert=True)
 @event.listens_for(eng, connect, insert=True)
 def setup_autocommit(dbapi_conn, rec):
 dbapi_conn.autocommit = True

 conn = eng.connect()
 print conn.execute(select 1).fetchall()

This is exactly what I need, thanks. Furthermore, with psycopg2 level
autocommit, I don't think I even need a second statement-based pool,
it will decrease idle-in-transaction time on its own. In fact, I
thought this is what execution_options(autocommit=True) did, though
looking at the code, now I realize it wasn't.

Many thanks :)

PS: Yeah, I know I have to update to 8.x ;-)

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Prepared Statements in Postgresql

2014-02-12 Thread Claudio Freire

On Wed, Feb 12, 2014 at 5:03 PM, Tony Locke tlo...@tlocke.org.uk wrote:

I've noticed some opinions online indicating that psycopg2 does not
have prepared statement support (e.g. -

http://www.cerias.purdue.edu/site/blog/post/beware_sql_injections_due_to_missing_prepared_statement_support/)

the comment at the bottom of that post ultimately references a psycopg2
message from 2007 so you'd need to ask the psycopg2 folks for updated
information. However psycopg2 can do an executemany with great
efficiency as it is using methodologies for which you'd have to ask them,
so if they don't use PG's actual prepared mechanism, its probably
unnecessary. psycopg2 is an extremely mature and high performing product.

What it doesn't support is libpq's wire protocol for prepared
statements. But you can prepare and execute statements by issuing the
corresponding SQL (that will use the wire protocol for SQL execution,
which is a tad less efficient but still more efficient than separate
queries).

psycopg2's executemany isn't much more sophisticated than multiple
separate queries since it internally does exactly that. It may be a
tad faster since it's done in C, but I doubt the difference is
significant.

But, if you want an executemany that works in psycopg2 as it would in
pg8000, you can PREPARE and then executemany the EXECUTE queries.

--
You received this message because you are subscribed to the Google Groups
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Session remove/close MySQL

2014-02-07 Thread Claudio Freire

I knew I should've been more explicit

On Fri, Feb 7, 2014 at 1:07 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 On Feb 7, 2014, at 11:01 AM, Claudio Freire klaussfre...@gmail.com wrote:


 I've had similar issues with 0.7.10. SA opens an implicit transaction,


 incorrect, DBAPI does this, please see:
 http://www.python.org/dev/peps/pep-0249/#commit

Ok, yeah. The point is, it's open.


 and neither Session.remove nor Session.close really roll back the
 transaction

No, but the connection pool should. (reset_on_return, which I have enabled)

 The connection pool, if in use, will then not
 actually close the connection if it is to remained pooled, it calls
 rollback() as part of the pool release mechanism.  Recent versions of
 SQLAlchemy allow this to show up in the engine logs like any other rollback,
 so you probably wouldn't have noticed.

And *this* is what was not happening. Somehow, transactions remained
open on the database (I checked).

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Session remove/close MySQL

2014-02-07 Thread Claudio Freire

On Fri, Feb 7, 2014 at 2:35 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 The connection pool, if in use, will then not
 actually close the connection if it is to remained pooled, it calls
 rollback() as part of the pool release mechanism.  Recent versions of
 SQLAlchemy allow this to show up in the engine logs like any other rollback,
 so you probably wouldn't have noticed.

 And *this* is what was not happening. Somehow, transactions remained
 open on the database (I checked).

 that kind of thing generally happens to people when they aren't cleaning up 
 their sessions, or are using awkward engine/connection patterns.   the pool 
 has had a lot of bugs fixed but I haven't seen a bug where the pool isn't 
 emitting the rollback when the connection is marked closed.

There was an awkward pattern involved: using the session's connection
as returned by Session.connection() manually to issue some textual
SQL. Other than that, normal thread-local session stuff.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Session remove/close MySQL

2014-02-07 Thread Claudio Freire

On Fri, Feb 7, 2014 at 11:51 AM, Simon King si...@simonking.org.uk wrote:

 On Fri, Feb 7, 2014 at 9:28 AM, Christian Démolis 
 christiandemo...@gmail.com wrote:

 Hi all,

 Actually, i have some problem closing my session...
 I tried using scopedsession with session.remove
 I tried using normal session with session.close

 But in both cases, the Mysql session stay open.


 Why closing session has no effet on current Mysql connections ?


 SQLAlchemy maintains a pool of connections to the database. When you start
 a session, it checks a connection out from the pool, and when you close the
 session, it returns it to the pool. There are various configuration
 parameters you can use to control how the pool works. See the docs at
 http://docs.sqlalchemy.org/en/rel_0_9/core/pooling.html



I've had similar issues with 0.7.10. SA opens an implicit transaction, and
neither Session.remove nor Session.close really roll back the transaction
(even though they should, I've had lots of experimental evidence that it
does not always do it).

So, what I suggest, is issuing a session.commit() or session.rollback()
(according to your transactional needs) before the session.close/remove

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] prepared statements

2014-02-06 Thread Claudio Freire

On Thu, Feb 6, 2014 at 3:08 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 I was imagining/hoping I'd find something like this:

# prepare the query:
myPreparedQuery = mySession.query(stuff).filter(parameter 
 definitions).prepare()

# run the query whenever I need it during my session:
myPreparedQuery.parameters(stuff).fetchall()

 prepared statements don't really apply much to SELECT statements, the 
 performance gains from such are marginal as you typically invoke a particular 
 SELECT just once within a transaction, and prepared statements don't 
 necessarily carry across transaction or connection boundaries on backends.

That's simply false. Because of the following:

 There are various performance concerns related to SELECT queries and solving 
 them depends much on identifying where a particular performance issue 
 resides.   There's overhead on the side of Python/SQLAlchemy formulating the 
 string statement, theres overhead on the side of passing it to the database 
 to be parsed (which is by far the most infinitesimal part)

That part is most definitely not always so infinitesimally small. I
recall one case (granted, one among so many) of a complex query that
was very fast to execute. The query had dozens of joins, so it took
considerable planning time on the database side. Planning took around
300ms, where execution took only 25ms. So, having spotted the
bottleneck, I switched to using prepared statements (it was a query
that was run several times in the session, not millions of times, but
several) with different arguments. The benefit was considerable.

To do that (I was using SQLA 0.5) I had to compile the select object
and generate the query string, I used Text in the bindparams to
replace them with $1, $2, etc... (as postgres likes it), and the
built a PREPARE statement and an EXECUTE one, it was rather
simple, having SQLA generate the query string.

The result was a 10-fold increase in performance.

DBAPI doesn't have direct support for prepared statements, but hacking
them isn't that difficult. You just have to make sure you send the
PREPARE, EXECUTE and the DEALLOCATE (never forget the DEALLOCATE) on
the same connection. Which is easy, just ask SQLAlchemy's session for
its connection and use it.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] prepared statements

2014-02-06 Thread Claudio Freire

On Thu, Feb 6, 2014 at 4:00 PM, Jonathan Vanasco jonat...@findmeon.com wrote:
 just wondering -- would it be possible to mimic this behavior using a custom
 view for this select ( in postgresql ) and then querying that ? i think the
 query planner might only run on the creation.

I think views don't cache the plan, they're handled as rules.

What you'd need is a function (pg's version of stored procedures).

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] prepared statements

2014-02-06 Thread Claudio Freire

On Thu, Feb 6, 2014 at 4:29 PM, Rick Otten rottenwindf...@gmail.com wrote:
 Thanks Claudio - I'll mull over the pros and cons of explicitly managing the
 connections to prepare the statements vs just being patient while the job
 runs vs gains I might achieve elsewhere.

Remember, prepared statements will only help if planning time is a
considerable fraction of execution time. The queries in need of this
would be complex. Try to explain analyze the queries you repeatedly
execute to see whether preparing them is worth the hassle. I don't
remember which version of postgres introduced planning time in the
explain, but if you don't have it, just take the difference between
the reported execution time and what you time from your end, save
network latency that should be equal to what you'd save by preparing.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] prepared statements

2014-02-06 Thread Claudio Freire

On Thu, Feb 6, 2014 at 4:38 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 On Feb 6, 2014, at 1:31 PM, Claudio Freire klaussfre...@gmail.com wrote:


 There are various performance concerns related to SELECT queries and 
 solving them depends much on identifying where a particular performance 
 issue resides.   There's overhead on the side of Python/SQLAlchemy 
 formulating the string statement, theres overhead on the side of passing it 
 to the database to be parsed (which is by far the most infinitesimal part)

 That part is most definitely not always so infinitesimally small. I
 recall one case (granted, one among so many) of a complex query that
 was very fast to execute. The query had dozens of joins, so it took
 considerable planning time on the database side. Planning took around
 300ms, where execution took only 25ms. So, having spotted the
 bottleneck, I switched to using prepared statements (it was a query
 that was run several times in the session, not millions of times, but
 several) with different arguments. The benefit was considerable.

 ideally the database would know how to cache plans based on the string form 
 of the statement.  Oracle does this.  It appears that Postrgresql does not 
 unless you specifically use a prepared statement.

Well, it cannot. At least not simplistically, because postgres uses
the literals in the statement (the ones likely to change) to
specialize plan cost, and different values can produce different plans
(depending on stats). For instance, a query with a where clause
deleted=false that selects 99% of a table would do a sequential
scan, whereas deleted=true might use an index (if there was such an
index).

They thus don't cache plans unless specifically asked for (with
prepare), because those plans may very well be inferior (they don't
use value-specific knowledge).

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] prepared statements

2014-02-06 Thread Claudio Freire

On Thu, Feb 6, 2014 at 4:47 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 On Feb 6, 2014, at 2:43 PM, Claudio Freire klaussfre...@gmail.com wrote:

 On Thu, Feb 6, 2014 at 4:38 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 On Feb 6, 2014, at 1:31 PM, Claudio Freire klaussfre...@gmail.com wrote:


 There are various performance concerns related to SELECT queries and 
 solving them depends much on identifying where a particular performance 
 issue resides.   There's overhead on the side of Python/SQLAlchemy 
 formulating the string statement, theres overhead on the side of passing 
 it to the database to be parsed (which is by far the most infinitesimal 
 part)

 That part is most definitely not always so infinitesimally small. I
 recall one case (granted, one among so many) of a complex query that
 was very fast to execute. The query had dozens of joins, so it took
 considerable planning time on the database side. Planning took around
 300ms, where execution took only 25ms. So, having spotted the
 bottleneck, I switched to using prepared statements (it was a query
 that was run several times in the session, not millions of times, but
 several) with different arguments. The benefit was considerable.

 ideally the database would know how to cache plans based on the string form 
 of the statement. Oracle does this.  It appears that Postrgresql does not 
 unless you specifically use a prepared statement.

 Well, it cannot. At least not simplistically, because postgres uses
 the literals in the statement (the ones likely to change)
 to
 specialize plan cost, and different values can produce different plans
 (depending on stats). For instance, a query with a where clause
 deleted=false that selects 99% of a table would do a sequential
 scan, whereas deleted=true might use an index (if there was such an
 index).

 They thus don't cache plans unless specifically asked for (with
 prepare), because those plans may very well be inferior (they don't
 use value-specific knowledge).

 right, but from an API POV, its more tedious since we have to track the 
 handle on the outside.

 This could be implemented as a cursor event even in conjunction with a custom 
 execution option and possibly using compiled_cache as well.

That would be tricky. You'd have to maintain a tally of which
connection prepared which query (think the complications of
multi-engine setups), and make sure you deallocate the handle before
returning the connection to the pool. Not to mention coming up with
unique handles.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Strange behaviour in multi-process environment

2014-01-20 Thread Claudio Freire

On Mon, Jan 20, 2014 at 1:51 PM, Richard Gerd Kuesters
rich...@humantech.com.br wrote:
 i was having this kind of problem while using a multi-threaded app, but with
 a postgres backend.

 in postgres, with high-concurrency, i was expecting this kind of behaviour,
 so i had to implement some simple semaphores to make it work properly --
 sqlalchemy is not to blame if something changes in your database while one
 end of a software creates a record that the another doesn't knows about when
 trying to operate on.

I think it may have something to do with the implicit transaction the
session opens when manipulated.

P2 may be holding on to an older snapshot.

I'd suggest committing right before checking to see whether the
records are there in P2, just as an experiment. The right behavior
would be to commit after any kind of session manipulation, though
that's trickier (lots of things count as implicit session
manipulations).

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Why ORM objects in a session expire immediately after a failure in flush?

2014-01-10 Thread Claudio Freire

On Thu, Jan 9, 2014 at 9:59 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 I do notice that you’re catching an IntegrityError.The typical pattern
 when writing code that wants to catch these and then continue is to run the
 individual set of questionable operations within a SAVEPOINT, that is a
 begin_nested()/commit() block.  Recent versions of SQLAlchemy have the
 behavior such that when a flush() exception rolls back the SAVEPOINT,
 objects that were not modified within the SAVEPOINT are not expired; only
 those objects that changed within the save point’s scope do.

How recent does recent mean there? (just curious)

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Custom logic for query execution

2014-01-10 Thread Claudio Freire

On Fri, Jan 10, 2014 at 3:50 PM, Sylvester Steele
sylvesterste...@gmail.com wrote:
 Hi,
 Currently I am using the sqlalchemy engine to execute string queries only. I
 do plan on using sqlalchemy more extensively (including ORM) in the near
 future.
 I need to add retry logic on every query, in case of some database failures
 (less than ideal, but the server is a bit flaky).

 Questions:
 1. Would subclassing the Engine and overriding the execute function achieve
 this?

Let me first point out, that you cannot consider all queries equal,
and retry-able.

First problem you'll face, is secondary effects. Update and inserts
obviously, but secondary effects hidden in select queries could also
be an issue (say, triggers or something else that has a secondary
effect).

Second problem you'll face are transactions. When a failure occurs,
you usually have no choice to retry a single query, you have to retry
the whole transaction.

So doing it at the Engine level may prove a bad choice with the above in mind.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] SQLAlchemy 0.9.1 released

2014-01-08 Thread Claudio Freire

Typo:

when condition is non-empty

should be

when conditionS is non-empty


On Wed, Jan 8, 2014 at 1:53 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 a new section has been added as the first “Core Behavioral Change”:

 http://sqlalchemy.readthedocs.org/en/rel_0_9/changelog/migration_09.html#none-can-no-longer-be-used-as-a-partial-and-constructor



 On Jan 8, 2014, at 11:27 AM, Michael Bayer mike...@zzzcomputing.com wrote:


 geez..its 10 degrees here, sorry, just scratch that first case, it has to be
 like this to be fully compatible both ways:

 def my_select(conditions):
 stmt = select([column('x')])
 if conditions:
 stmt = stmt.where(and_(*conditions))
 return stmt

 “cond  None” was never any kind of publicly documented behavior and it was
 inconsistent, sorry.








 On Jan 8, 2014, at 11:22 AM, Michael Bayer mike...@zzzcomputing.com wrote:

 sorry, this should read:

 Therefore, your script cannot work in either 0.8 or 0.9, unless you fix it
 as follows, in which case it works the same in both versions:

 def my_select(conditions):
 cond = None
 for c in conditions:
 cond = c  cond
 stmt = select([column(‘x’)])
 if cond is not None:
 stmt = stmt.where(cond)
 return stmt



 On Jan 8, 2014, at 11:20 AM, Michael Bayer mike...@zzzcomputing.com wrote:

 Therefore, your script cannot work in either 0.8 or 0.9, unless you fix it
 as follows, in which case it works the same in both versions:

 def my_select(conditions):
 cond = conditions[0]
 for c in conditions[1:]:
 cond = c  cond
 stmt = select([column(‘x’)])
 if cond is not None:
 stmt = stmt.where(cond)
 return stmt

  or you assume that “conditions” is non-empty, in which case, as I mentioned
 earlier, do this:

 def my_select(conditions):
 cond = conditions[0]
 for c in conditions[1:]:
 cond = c  cond

 return select([column('x')]).where(cond)





-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Claudio Freire

On Wed, Nov 13, 2013 at 2:04 PM, Michael Bayer mike...@zzzcomputing.com wrote:

On Nov 13, 2013, at 11:52 AM, Claudio Freire klaussfre...@gmail.com wrote:

On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer mike...@zzzcomputing.com
wrote:

Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:

I wrote a full post regarding this topic on stackoverflow at
http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
. If you start with this, I can answer more specific questions.

The article was very helpful, thanks. I still want to figure out the best
balance between convenience and speed for my use case. Do the following make
sense and is possible?

I work only with Postgresql and I'm sure that all involved objects have a
unique id column which is called 'id'. So before doing a session.commit(),
I could check how many objects are in my session. As I'm just bulk
inserting, I know that all of them are new and don't have their id set yet.
Now I ask the database for that number of new ids, iterate over the objects
in my session and set the ids. Internally all ids would come from a single
sequence, so I don't have to care about object types and so on. Afterwards
SqlAlchemy should be aware that ids have already been set, so no generated
ids have to be returned and the session.commit() should be much simpler and
faster.

Sounds like a still quite simple, but hopefully much faster solution. Do you
agree?

sure that should be fine, if you can pre-calc your PKs. It just won’t work
under any kind of concurrency, as in such a situation there could be
interleaved INSERTs from different processes.

Postgresql sequences already handle that kind of concurrency scenario.

how exactly, if two transactions T1 and T2 both pull a number from a
sequence, T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not
yet occurred for each, T1 has 40-50, T2 has 41-51, the number is totally
wrong for both - in reality it would be some random distribution of 40-60
between T1 and T2. No ?

No, you ask for 10 ids to the same sequence, and the sequence
allocates T1 40-49, and T2 50-59

Re: [sqlalchemy] Re: using SQLA in distributed, concurrent, or asynchronous contexts?

2013-10-07 Thread Claudio Freire

On Mon, Oct 7, 2013 at 12:01 PM, Jonathan Vanasco jonat...@findmeon.com wrote:
 so the popular workaround(s) are to do both:

 1- defer to thread (which most people would do anyways)
 2- open/close multiple sessions (which are cheap).  basically , instead of
 approaching a task in twisted as a single action of multiple components,
 think of it like  a website , with each action as a new page request.  your
 web session is the same, but your SqlAlchemy Session changes on each
 request.


There's a 3.

3- open a single request-scope session, and carry it through your async tasks.

You'll have to be super-careful not to use any thread-local state,
always use the request-scope session, but that's relatively easy with
most frameworks. Probably the easiest way would be to instantiate an
explicit session with a session_maker, and then carry it implicitly in
your callbacks as a closure. This is even easier on web frameworks,
that support this kind of stuff built-in since it's so very common.

But, adapting, got_api:

   def process_request():
d = deferToThread( get_api, session_maker() ).addCallback(got_api)

   def get_api(session)
   # use session
   block

   def got_api():
   process data

At first sight, this doesn't look all that better, but when you start
chaining deferreds, it will be.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] using SQLA in distributed, concurrent, or asynchronous contexts?

2013-10-03 Thread Claudio Freire

On Thu, Oct 3, 2013 at 6:02 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 well there's always gevent.  It might even be possible to get twisted and
 gevent to work together.

 So far I've managed to stick with multiprocessing for any kind of
 parallelism and it's done fine, but I'm not scaling up to hundreds or
 thousands of simultaneous things going on.



 On Oct 3, 2013, at 2:23 PM, Iain Duncan iainduncanli...@gmail.com wrote:

 Hi folks, I just got a new position and am hoping I can advocate for using
 SQLAlchemy there. I will be doing some backend web service stuff that
 interacts with a db, but likely in some form of distributed architecture.
 It's all going to be on AWS. I believe there will be some service
 implementation that either needs to be highly performant or highly
 concurrent or some combination of both. IE there may be lots of clients
 hitting the service and the service has to interact with third party payment
 services which might be slow, and we don't want the server choking just
 waiting on the third party responses.

 Up till now I haven't had to do stuff in this domain. Can anyone tell me
 what I might want to look into as far as ways of handling concurrency or
 non-blocking services that will play fair with SQLAlchemy? I read that
 trying to get SA working on Twisted is not a good plan.


It depends a lot on the workloads.

I've tried gevent with pure SQLA apps, and it works quite dandy.

However, as soon as you start adding the stuff all applications are
made of (libraries), you risk breakage more and more. I've found out
the hard way. As in segfaults.

I think most relatively high-level pure-python libs should do fine.
Mixing multiprocessing with gevent, doesn't do fine. Trow in other
reactors (zmq on threads in my case), and things also break pretty
badly.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] What is declarative_base() exactly?

2013-09-23 Thread Claudio Freire

On Mon, Sep 23, 2013 at 10:22 AM, Edward Kim onward.ed...@gmail.com wrote:
  Base = declarative_base()
  Base
 class 'sqlalchemy.ext.declarative.api.Base'


 How this function is return class, not instance? Is it kind of design
 pattern?

 I know It is not a big deal for just using SQLAlchemy, but I can't explain
 what it is
 and how can return the class.


A class is an object like any other.

You can pass around references to classes like any other reference,
and you can create them just as well:

 def make_me_a_class():
...class A(object):
...   def f(self):
...  print I'm a class
...return A
...
 B = make_me_a_class()
 c = B()
 c.f()
I'm a class


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Why no Raw Query Builder?

2013-08-10 Thread Claudio Freire

On Sat, Aug 10, 2013 at 11:31 AM, Taba Taba betak...@gmail.com wrote:
 Thank you so much. Another question:

 $this-select(col1);
 if(1  0) {
 $this-select(col2, col3);
 }
 $this-from(tbl)-where(1 = 1);
 if( 2  1) {
 $this-where(2  1);
 }
 $this-left_outer_join(tbl2, tbl2.t_id = tbl.id);
 // output: SELECT col1, col2, col3
 // FROM tbl
 // LEFT OUTER JOIN tbl2
 // ON tbl2.t_id = tbl.id
 // WHERE 1=1 AND 2  1

 On sqlalchemy?

 s = select([col1])
 if 1  0:
 s.append_column(col2, col3)
 s.from(tbl) # not found from(...) in sqlalchemy.sql.expression.Select or
 _SelectBase
 s.where(1 = 1)
 if 2  1:
 s.where(2  1)
 s.outerjoin(tbl2, tbl2.t_id = tbl.id) # not works
 print s

 Thank you in advance!


Where's the question?

I think your problem, though, is that you didn't read the tutorial[0].
Or if you did, not carefully enough.

[0] http://docs.sqlalchemy.org/en/rel_0_8/core/tutorial.html#selecting

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] sorry, too many clients already

2013-07-26 Thread Claudio Freire

On Thu, Jul 25, 2013 at 8:39 PM, kris kkvilek...@gmail.com wrote:


 On Thursday, July 25, 2013 4:12:50 PM UTC-7, Klauss wrote:

 On Thu, Jul 25, 2013 at 7:58 PM, kris kkvil...@gmail.com wrote:
 
  My postgres.conf has a parameter  max_connections = 100

 That's not only the default, but it's also not really recommended to
 push it much higher, so only do so if you really need a big pool on
 each machine, and if you're sure that pool will be mostly idle all of
 the time.


 Hmm.. I just bumped it 200 and modified the shared_buffers to be 32MB

Perhaps a bit OT on this list, but you probably want to increase that further.

Check out postgres' tuning wiki[0]

[0] http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] sorry, too many clients already

2013-07-25 Thread Claudio Freire

On Thu, Jul 25, 2013 at 7:32 PM, kris kkvilek...@gmail.com wrote:

 We are running a replicated (multiple machines behind ngnix) Turbogears
 2.1.5 App With SA 0.7.8 and postgres 8.4

 After a lot of recent usage the system ground to halt and we are receiving
 (OperationalError) FATAL: sorry, too many clients already

 while trying to any DBSession.query.

 Any ideas what could cause this?

As the message says, you have too many open connections.

When you have multiple machines, you must either provision your
postgres to be able to handle a full connection pool (max_overflow
preferrably, pool_size at least) for each nginx worker for each
machine, or have SQLA connect to a pgbouncer that will do the shared
pooling for you.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] sorry, too many clients already

2013-07-25 Thread Claudio Freire

On Thu, Jul 25, 2013 at 7:58 PM, kris kkvilek...@gmail.com wrote:

 My postgres.conf has a parameter  max_connections = 100

That's not only the default, but it's also not really recommended to
push it much higher, so only do so if you really need a big pool on
each machine, and if you're sure that pool will be mostly idle all of
the time.

 I was led to believe the SA would manage pool of connections
 to some good default
 (http://docs.sqlalchemy.org/en/rel_0_7/core/pooling.html?highlight=connection%20pool),
 but couldn't find the default values?  Just so I can relate the two figures
 (SA Pool parameters and PG max_connections)

pool_size, see create_engine[0], the default is 5

 Also Is there a good way to get it too close some idle connections?

Yes, decrease pool_size. Just how many workers do you have? With the
defaults, you'd need a little under 20 workers to start getting those
errors.

With that many workers, your only real solution is to install a
pgbouncer to manage those connections globally. SQLA cannot manage a
pool of connections across workers, only a shared pool (pgbouncer) can
do that. SQLA's pool is local to each process (worker).

[0] 
http://docs.sqlalchemy.org/en/rel_0_7/core/engines.html#sqlalchemy.create_engine

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query caching allows two objects representing same underlying data in session?

2013-07-16 Thread Claudio Freire

On Mon, Jul 15, 2013 at 8:02 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 On Jul 15, 2013, at 4:51 PM, Amir Elaguizy aelag...@gmail.com wrote:

 I'm having this weird problem using the query caching recipes in which two 
 instances of a model representing the same underlying dataset will both get 
 into the session.

 I know this is happening because I put all of the models in a set() and 
 there are two instances with the same underlying database row id.

 I was under the impression that the session itself would handle the case 
 that an object coming from the query cache is already in the session, 
 preventing duplication. Is this not the case?

 well you need to be using the merge() aspect of it, which will reconcile an 
 existing identity that's already in the session.  the recipe as written uses 
 merge_result() so will ensure this, yes.This only deals with the identity 
 map though, if you have an object pending with a given identity, its not in 
 the identity map.   I'd advise against heavy usage of cached queries 
 overlapping with lots of pending objects within the same sets because things 
 can get very crazy.

Because of all that crazyness, that I came to the realization that you
can only cache detached instances, and you can never merge them into
your session. If you do, you may end up with a concurrency mess, if
two threads want to merge the same cached instance into a session.

To put a cached instance into a session, you must first copy it, then
update. How to do that, is very application-specific, and I don't
think it can be automated.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query caching allows two objects representing same underlying data in session?

2013-07-16 Thread Claudio Freire

On Tue, Jul 16, 2013 at 4:22 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 To put a cached instance into a session, you must first copy it, then
 update. How to do that, is very application-specific, and I don't
 think it can be automated.

 that's what merge(don't_load=True) does. If your object has other aspects 
 to it that are outside of what's mapped, then there are event hooks you can 
 use to re-instantiate additional state.


Oh... didn't know that. That's interesting. I've never toyed with the
event system before, but I guess there's an incentive right there.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-06-07 Thread Claudio Freire

On Thu, Jun 6, 2013 at 2:20 PM, Michael Bayer mike...@zzzcomputing.com
wrote:

 diff --git a/lib/sqlalchemy/orm/query.py b/lib/sqlalchemy/orm/query.py
 index c2ec72c..b458975 100644
 --- a/lib/sqlalchemy/orm/query.py
 +++ b/lib/sqlalchemy/orm/query.py
 @@ -40,12 +40,15 @@ from ..sql import (
  __all__ = ['Query', 'QueryContext', 'aliased']


 -def _generative(*assertions):
 +def _generative(*assertions, maintain_baked=False):
  Mark a method as generative.

  @util.decorator
  def generate(fn, *args, **kw):
  self = args[0]._clone()
 +if not maintain_baked:
 +del self._baked_cache
 +del self._baked_context
  for assertion in assertions:
  assertion(self, fn.func_name)
  fn(self, *args[1:], **kw)
 @@ -1157,7 +1160,7 @@ class Query(object):

  self._lockmode = mode

 -@_generative()
 +@_generative(maintain_baked=True)
  def params(self, *args, **kwargs):
  add values for bind parameters which may have been
  specified in filter().


That doesn't seem to be enough.

subqueryload seems to be using the wrong query still, after clearing the
baked context, resulting in some very wrong sharing of connections between
threads (I'm getting some very fun segmentation faults).

I think it's with_parent, it clears the baked context through a filter, but
the filter is called on the wrong query (the global query instance I use as
template), that has the wrong session attached or something.

This is my current BakedQuery:

 class BakedQuery(sqlalchemy.orm.query.Query):

_baked_context = None

_baked_cache = None


 def _clone(self):

rv = super(BakedQuery, self)._clone()

try:

del rv._baked_context

del rv._baked_cache

except AttributeError:

pass

return rv


 def params(self, *p, **kw):

rv = super(BakedQuery, self).params(*p, **kw)

rv._baked_context = self._baked_context

rv._baked_cache = self._baked_cache

return rv

 def with_session(self, *p, **kw):

rv = super(BakedQuery, self).with_session(*p, **kw)

rv._baked_context = self._baked_context

rv._baked_cache = self._baked_cache

return rv

 @sqlalchemy.orm.query._generative()

def bake_as(self, name, cache):

Freeze the statement used by this Query.


 if name not in cache:

cache[name] = context = self._compile_context()

del context.session

del context.query

self._baked_context = cache[name]

self._baked_cache = cache


 def _compile_context(self, **kw):

if self._baked_context is not None:

QueryContext = sqlalchemy.orm.query.QueryContext

context = QueryContext.__new__(QueryContext)

context.__dict__.update(self._baked_context.__dict__)

context.query = self

context.session = self.session

# need to fix these names, urg

context.attributes = context._attributes = context.attributes.copy()

return context

else:

return super(BakedQuery, self)._compile_context(**kw)


 def _execute_and_instances(self, querycontext):

if self._baked_cache is not None:

self = self.execution_options(compiled_cache=self._baked_cache)

return super(BakedQuery, self)._execute_and_instances(querycontext)



And I invoke it like:


def some_function(query = blabla.bake_as(blablah)):

return query.with_session(S).params(...).first()


This code still breaks if I don't use a baked template:


def some_function(query = blabla):

return query.with_session(S).bake_as(blablah).params(...).first()

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-06-07 Thread Claudio Freire

On Fri, Jun 7, 2013 at 1:56 PM, Claudio Freire klaussfre...@gmail.com wrote:

 This is my current BakedQuery:

 class BakedQuery(sqlalchemy.orm.query.Query):



F**k gmail again. Why the F+!k doesn't it show me that it'll bork all
whitespace before I hit send... I smell a bug report coming...

class BakedQuery(sqlalchemy.orm.query.Query):
._baked_context = None
._baked_cache = None
.
.def _clone(self):
.rv = super(BakedQuery, self)._clone()
.try:
.del rv._baked_context
.del rv._baked_cache
.except AttributeError:
.pass
.return rv
.
.def params(self, *p, **kw):
.rv = super(BakedQuery, self).params(*p, **kw)
.rv._baked_context = self._baked_context
.rv._baked_cache = self._baked_cache
.return rv
.
.def with_session(self, *p, **kw):
.rv = super(BakedQuery, self).with_session(*p, **kw)
.rv._baked_context = self._baked_context
.rv._baked_cache = self._baked_cache
.return rv
.
.@sqlalchemy.orm.query._generative()
.def bake_as(self, name, cache):
.Freeze the statement used by this Query.
.
.if name not in cache:
.cache[name] = context = self._compile_context()
.del context.session
.del context.query
.self._baked_context = cache[name]
.self._baked_cache = cache
.
.def _compile_context(self, **kw):
.if self._baked_context is not None:
.QueryContext = sqlalchemy.orm.query.QueryContext
.context = QueryContext.__new__(QueryContext)
.context.__dict__.update(self._baked_context.__dict__)
.context.query = self
.context.session = self.session
.# need to fix these names, urg
.context.attributes = context._attributes =
context.attributes.copy()
.return context
.else:
.return super(BakedQuery, self)._compile_context(**kw)
.
.def _execute_and_instances(self, querycontext):
.if self._baked_cache is not None:
.self = self.execution_options(compiled_cache=self._baked_cache)
.return super(BakedQuery, self)._execute_and_instances(querycontext)

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-06-06 Thread Claudio Freire

On Tue, Jun 4, 2013 at 5:26 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Sun, Jun 2, 2013 at 9:41 PM, Claudio Freire klaussfre...@gmail.com
 wrote:

 So the whole thing is rolled up into the named thing I referred to
 also, so that there's no need to keep a Query object hanging around, when we
 say bake() we're really just referring to a position in the code
 somewhere, so I've updated the wiki recipe to use a named system like this:

 q = s.query(Foo).\
 filter(Foo.data == bindparam('foo')).\
 bake_as(foo, cache)
 result = q.params(foo='data 12').all()


 A highly cleaned up version of your test is attached.

 I'm still not sure I'm getting everything accounted for here!  thanks for
 testing !   The feature is actually looking quite simple and probably works
 better as something built in, or at least if we added some methods to
 QueryContext to ease the burden of caching/copying it.


 Well, if that works, it certainly covers my needs so there would be no
 pressing need to incorporate it into the core.
 I'll let you know tomorrow.



 I've done only superficial testing for now, I have to build me a test
 database, but I thought I'd let you know, it seems to work flawless till
 now.


Seems to break unpredictably with subqueryloads.

I'll try to get a test for it.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-06-04 Thread Claudio Freire

On Sun, Jun 2, 2013 at 9:41 PM, Claudio Freire klaussfre...@gmail.comwrote:

 So the whole thing is rolled up into the named thing I referred to also,
 so that there's no need to keep a Query object hanging around, when we say
 bake() we're really just referring to a position in the code somewhere,
 so I've updated the wiki recipe to use a named system like this:

 q = s.query(Foo).\
 filter(Foo.data == bindparam('foo')).\
 bake_as(foo, cache)
 result = q.params(foo='data 12').all()


 A highly cleaned up version of your test is attached.

 I'm still not sure I'm getting everything accounted for here!  thanks for
 testing !   The feature is actually looking quite simple and probably works
 better as something built in, or at least if we added some methods to
 QueryContext to ease the burden of caching/copying it.


 Well, if that works, it certainly covers my needs so there would be no
 pressing need to incorporate it into the core.
 I'll let you know tomorrow.



I've done only superficial testing for now, I have to build me a test
database, but I thought I'd let you know, it seems to work flawless till
now.

:^)

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-06-02 Thread Claudio Freire

On Sat, Jun 1, 2013 at 10:59 PM, Michael Bayer mike...@zzzcomputing.comwrote:

 The recipe on the wiki also has the issue that it isn't even caching
 anything to do with the QueryContext, including all of this information
 regarding eager joins which is pretty important.  Your modifications try to
 correct for this by storing that context, but then it still creates a
 brand new context anyway and just transfers not nearly enough of its state
 over for things to work.


Yes, I was trying to keep it minimal to figure out why the error up there,
but I did start the way you describe, by storing everything
session-independent from the query context.


 So the whole thing is rolled up into the named thing I referred to also,
 so that there's no need to keep a Query object hanging around, when we say
 bake() we're really just referring to a position in the code somewhere,
 so I've updated the wiki recipe to use a named system like this:

 q = s.query(Foo).\
 filter(Foo.data == bindparam('foo')).\
 bake_as(foo, cache)
 result = q.params(foo='data 12').all()


 A highly cleaned up version of your test is attached.

 I'm still not sure I'm getting everything accounted for here!  thanks for
 testing !   The feature is actually looking quite simple and probably works
 better as something built in, or at least if we added some methods to
 QueryContext to ease the burden of caching/copying it.


Well, if that works, it certainly covers my needs so there would be no
pressing need to incorporate it into the core.
I'll let you know tomorrow.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-06-02 Thread Claudio Freire

On Mon, Jun 3, 2013 at 1:55 AM, Michael Bayer mike...@zzzcomputing.comwrote:

 Well, if that works, it certainly covers my needs so there would be no
 pressing need to incorporate it into the core.
 I'll let you know tomorrow.



 It's so straightforward at this point I'm leaning towards some more
 cleanup and adding to 0.9.   I'll probably cache the compiled statements on
 the QueryContext itself too, and the cache dictionary will be via Session /
 Sessionmaker.



Bear in mind that the use case I'm not only now, but always facing,
involves queries in many, different, short-lived sessions. So making it
Session-local won't live long enough, unless it can be given to the
Sessionmaker itself and have that caching context be shared among all
sessions.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-05-31 Thread Claudio Freire

On Thu, May 30, 2013 at 7:04 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 The hashing thing really has to start as a core concept first.   It's a big 
 job but would be very helpful for caching scenarios and would allow us to 
 build this feature on Query without too much difficulty.  The nice thing 
 about unhashable is that simple queries will be hashable, but as soon as 
 complexity increases you'd start seeing unhashables come in, preventing us 
 from caching something that isn't actually easy to cache.

AFAIK only py3 has support for making user classes unhashable.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-05-31 Thread Claudio Freire

On Fri, May 31, 2013 at 11:29 AM, Michael Bayer
mike...@zzzcomputing.com wrote:


 On Friday, May 31, 2013 10:18:41 AM UTC-4, Klauss wrote:

 On Thu, May 30, 2013 at 7:04 PM, Michael Bayer mik...@zzzcomputing.com
 wrote:
 
  The hashing thing really has to start as a core concept first.   It's a
  big job but would be very helpful for caching scenarios and would allow us
  to build this feature on Query without too much difficulty.  The nice thing
  about unhashable is that simple queries will be hashable, but as soon as
  complexity increases you'd start seeing unhashables come in, preventing us
  from caching something that isn't actually easy to cache.

 AFAIK only py3 has support for making user classes unhashable.


 I'm not considering using `__hash__()` for this, I'd rather keep it as a
 special method for this purpose.

 But after sleeping on it, I'm still pretty skeptical, because it's actually
 pretty difficult to determine what parts of a statement will remain
 constant across backends.

It's not necessary to be constant across backends. All of
compiled_cache machinery already appends the dialect so only
semantically constant would be required.


 If you have a select like, SELECT x + ? FROM q, where ? is a bound
 parameter, that statement won't run on some backends which don't allow bound
 parameters in the columns clause.   So a select() object select([x + 3]),
 we would theoretically have to include the number 3 as part of its cache
 key...but based on where the 3 is present.   Similar things happen when
 you say select().limit(x) - LIMIT can usually be rendered via bound
 parameter, but not on backends like Sybase or SQL Server where it is
 rendered in the TOP clause that can't be bound.

So yeah, you don't have to care about that. It's taken care at other
levels. Hashing should be concerned with semantics only.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-05-31 Thread Claudio Freire

On Fri, May 31, 2013 at 12:46 PM, Michael Bayer
mike...@zzzcomputing.com wrote:
 this hash works for backends that can render LIMIT as a bound parameter.  It 
 will *not* work for SQL server which cannot render LIMIT as a bound parameter.

 If the hash is determined at the level of Query, we *do not* know whether or 
 not the backend supports LIMIT as a bound parameter, unless we ask it.

 So OK, we make more rules - ask the backend if LIMIT can be hashed as a bound 
 parameter.

 But then what if a particular backend has more restrictive bound parameter 
 rules than Query is aware of?   What if we throw the Firebird backend at it, 
 and all the sudden Firebird has some quirk where you can't put a bound 
 parameter inside of a CASE statement inside of the columns clause?  How will 
 Query know that suddenly another deeply embedded bound parameter can no 
 longer be considered hashable as a bound parameter, and must be hashed as a 
 literal value ?


I see.

The solution is to create a new object type, QueryKey, that contains
both the Query and the Dialect, and builds its hash either by invoking
Query.hash(dialect), or by visiting it somehow.

The underlying problem is that Query cannot decide the hash by itself.
Then it shouldn't try to. It should only support building a hash with
respect to a specific dialect. The only thing required of it is that
the hash be stable within cacheable queries of that dialect, no need
to ponder about hasheability across all dialects.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-05-31 Thread Claudio Freire

On Thu, May 30, 2013 at 3:28 PM, Claudio Freire klaussfre...@gmail.com
wrote:
 On Thu, May 30, 2013 at 2:25 PM, Michael Bayer mike...@zzzcomputing.com
wrote:

 On May 30, 2013, at 11:25 AM, Claudio Freire klaussfre...@gmail.com
wrote:


 That way, one could use the second form up there and benefit from
 query hashing, because session/param binding wouldn't change the hash,
 and it would be a cache hit. Has it been explored already? Or maybe
 there's something wrong on how I'm using the compiled_cache thing?
 Should I start patching? ;-)

 there is a very wide gap between working with the baked query recipe,
for which I'm not aware of all the errors you refer to so more detail would
help

 Um... I don't remember the exact problems, will have to try it again.
 I just assumed it was targeted at an older SA release and gave up on
 it rather quickly.


So, this is what I'm getting, which is weird:

Traceback (most recent call last):
  File stdin, line 1, in module
  File
/usr/local/lib64/python2.7/site-packages/chorde-0.1-py2.7-linux-x86_64.egg/chorde/decorators.py,
line 184, in cached_f
rv = f(*p, **kw)
  File metrics.py, line 291, in action_metrics
if action is not None and action.campaign is not None:
  File
/usr/local/lib64/python2.7/site-packages/SQLAlchemy-0.7.10-py2.7-linux-x86_64.egg/sqlalchemy/orm/attributes.py,
line 458, in get
value = self.callable_(state, passive)
  File
/usr/local/lib64/python2.7/site-packages/SQLAlchemy-0.7.10-py2.7-linux-x86_64.egg/sqlalchemy/orm/strategies.py,
line 481, in _load_for_state
(mapperutil.state_str(state), self.key)
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance Action at
0x36f04d0 is not bound to a Session; lazy load operation of attribute
'campaign' cannot proceed

With:

 class CacheableQuery(sqlalchemy.orm.query.Query):

def __init__(self, *p, **kw):

self._cached_context = None

self._cached_context_labels = None

super(CacheableQuery, self).__init__(*p, **kw)

 @sqlalchemy.orm.query._generative()

def bake(self, labels=True):

self._compile_context(labels)


 def _compile_context(self, labels=True):

if self._cached_context and self._cached_context_labels == labels:

context = super(CacheableQuery, self)._compile_context(labels)

cached = self._cached_context
context.statement = cached.statement
else:

context = super(CacheableQuery, self)._compile_context(labels)

self._cached_context_labels = labels

self._cached_context = context

return context

All the rest in baked query recipe is already supported by SA, so I didn't
include it. Like with_session instead of from_session, and
execution_options() instead of hardcoding a cache.

Again, the usage is

q = blabla.bake()

...
q.with_session(S).params(..).first()

I have no idea why replacing the query messes the entity's session. Any
clue?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-05-31 Thread Claudio Freire

On Fri, May 31, 2013 at 4:03 PM, Claudio Freire klaussfre...@gmail.comwrote:

 With:

 class CacheableQuery(sqlalchemy.orm.query.Query):

 def __init__(self, *p, **kw):

 self._cached_context = None

 self._cached_context_labels = None

 super(CacheableQuery, self).__init__(*p, **kw)

  @sqlalchemy.orm.query._generative()

 def bake(self, labels=True):

 self._compile_context(labels)


  def _compile_context(self, labels=True):

 if self._cached_context and self._cached_context_labels == labels:

 context = super(CacheableQuery, self)._compile_context(labels)

 cached = self._cached_context
 context.statement = cached.statement
 else:

 context = super(CacheableQuery, self)._compile_context(labels)

 self._cached_context_labels = labels

 self._cached_context = context

 return context



Damn gmail

With:

. class CacheableQuery(sqlalchemy.orm.query.Query):

. def __init__(self, *p, **kw):

. self._cached_context = None

. self._cached_context_labels = None

. super(CacheableQuery, self).__init__(*p, **kw)

 . @sqlalchemy.orm.query._generative()

. def bake(self, labels=True):

. self._compile_context(labels)

.

. def _compile_context(self, labels=True):

. if self._cached_context and self._cached_context_labels == labels:

. context = super(CacheableQuery, self)._compile_context(labels)

. cached = self._cached_context
. context.statement = cached.statement
. else:

. context = super(CacheableQuery, self)._compile_context(labels)

. self._cached_context_labels = labels

. self._cached_context = context
.

. return context

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-05-31 Thread Claudio Freire

On Fri, May 31, 2013 at 4:44 PM, Michael Bayer mike...@zzzcomputing.comwrote:

 can you just attach a working .py script



How does that work without a database?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-05-31 Thread Claudio Freire

On Fri, May 31, 2013 at 4:47 PM, Claudio Freire klaussfre...@gmail.comwrote:


 On Fri, May 31, 2013 at 4:44 PM, Michael Bayer 
 mike...@zzzcomputing.comwrote:

 can you just attach a working .py script



 How does that work without a database?


Ok, I took one of SQLA's tests, and make it break ;)

Notice the problem here is that I close the session after querying.

Since the baked query has a joinedload, it shouldn't matter, but it does,
because when baking, eager loads are broken somehow.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




test_baked.py
Description: Binary data

[sqlalchemy] Query and compiled_cache

2013-05-30 Thread Claudio Freire

I know this has been discussed a great deal already, but I've been
noticing this:

SomeClass.query()
.filter_by(id = blah)
.execution_options(compiled_cache = _orm_query_cache)
.options(
 orm.joinedload(blah),
 orm.joinedload(blah, bleh),
 orm.joinedload(blah, bleh, blih),
 orm.joinedload(blah, bleh, blih, bloh),
 ).first()

It adds one entry in _orm_query_cache each time I run it.

The cache is a dict-like LRUCache object I intend to have in the
running application to avoid recompiling queries all the time. I've
also tried it like this:

_some_query = SomeClass.query()
.filter_by(id = bindparam(bleh))
.execution_options(compiled_cache = _orm_query_cache)
.options(
 orm.joinedload(blah),
 orm.joinedload(blah, bleh),
 orm.joinedload(blah, bleh, blih),
 orm.joinedload(blah, bleh, blih, bloh),
 )

globally, and in my hot function:

_some_query.with_session(session).params(bleh = bleh).first()

Both ways behave the same with regard to the cache. It will be in a
very hot path, I know compilation CPU usage pales in comparison with
the roundtrip, but since this is such a hot path, I expect compiling
repeatedly to be troublesome:

I expect to have an unholy amount of concurrency (in the order of 10k
rps), I don't want a few of these compiling bogging even minutely
other concurrent transactions, the application has very stringent
latency requirements, and even a small hiccup would be unacceptable.

I'm on latest 0.7, also tried 0.8. I tried the baked query recipe, and
it raises exceptions everywhere. I tried to patch it up, and it didn't
work (it would cache results as well which was really unintended).

But never mind baked query's brokenness, from what I've seen, and read
in other threads in the archive, the problem is queries are appended
to the cache by object identity. I'm thinking it wouldn't be hard
making the Query object hasheable, or at least adding a bake()
method that freezes it for hashing (prohibits hash-changing
alterations?).

That way, one could use the second form up there and benefit from
query hashing, because session/param binding wouldn't change the hash,
and it would be a cache hit. Has it been explored already? Or maybe
there's something wrong on how I'm using the compiled_cache thing?
Should I start patching? ;-)

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-05-30 Thread Claudio Freire

On Thu, May 30, 2013 at 2:25 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 On May 30, 2013, at 11:25 AM, Claudio Freire klaussfre...@gmail.com wrote:


 That way, one could use the second form up there and benefit from
 query hashing, because session/param binding wouldn't change the hash,
 and it would be a cache hit. Has it been explored already? Or maybe
 there's something wrong on how I'm using the compiled_cache thing?
 Should I start patching? ;-)

 there is a very wide gap between working with the baked query recipe, for 
 which I'm not aware of all the errors you refer to so more detail would help

Um... I don't remember the exact problems, will have to try it again.
I just assumed it was targeted at an older SA release and gave up on
it rather quickly.

 and that of patching SQLAlchemy with new features.   In particular I can't 
 imagine how it would cache results, there is nothing like that in the recipe 
 or compiled cache system.

Something with sharing the context among query instances, since query
instances cache results (or that I heard). Not sure about the
specifics of how that happened, but I did notice, after patching up
the recipe to not throw exceptions, that it would always return the
results for the first call. The queries themselves had no bind
parameters or anything, they would just return different results each
time due to concurrent updates to the database. Maybe that wasn't
expected for that recipe? In any case, I just gave up on it without
looking into it much.

 If you want to work on a feature that is actually going to change SQLAlchemy, 
 (and would that be before or after you finish #2720? :) ), it would be:

After, I didn't forget, just real life real work priorities made me
veer away from it. Since it was for 0.9, I judged I could safely delay
2720 a bit while I take care of work related priorities ;-)

 1. there is no bake() method, the entire thing is transparent

 2. as one calls session.query(X).filter(Y).join(Z), the hashing scheme is 
 working the entire time in the background against a single LRU cache, that is 
 perhaps per sessionmaker(), or otherwise, to produce cached versions of the 
 Query.  Everyone would get the performance gains for free in all cases.   A 
 flag or other system would exist to turn the feature off for those who are 
 heavily subclassing Query and having issues or other edge backwards 
 compatible issues.

That would mean Query objects would by default take the compiled_cache
from the session, and not only during flushes but always. If that's
alright, sure.

 3. it would just be on by default in a new major release like 0.9 or 1.0

I had assumed that much.

 4. it would have a super crapload of very complete and clean unit tests.

Ehm... I would imagine all the current tests involving Query would
cover most of it. A few more cache-specific tests could be added
surely, but only to check caching is indeed happening, correctness
should be checked by existing tests already.

 Otherwise, the bake() recipe as it is can be enhanced or augmented with 
 __hash__() methods and all that but I'm not aware of anything regarding it 
 that would require changes to SQLAlchemy itself, since it uses a Query 
 subclass.

Well, yeah, I guess so. But that subclass would have to step on all of
Query methods to be able to compute, cache and update the hash
(computing it always would be almost as costly as compiling, so it has
to be cached in an instance attribute). That'd be a chore, and it
would break every other release.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-05-30 Thread Claudio Freire

On Thu, May 30, 2013 at 3:45 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 On May 30, 2013, at 2:28 PM, Claudio Freire klaussfre...@gmail.com wrote:


 4. it would have a super crapload of very complete and clean unit tests.

 Ehm... I would imagine all the current tests involving Query would
 cover most of it. A few more cache-specific tests could be added
 surely, but only to check caching is indeed happening, correctness
 should be checked by existing tests already.

 well the current tests suffer very much from years of being integration tests 
 and not unit tests.   These tests would actually do a terrible job of testing 
 this cache, as almost all tests use a brand new mapping, a brand new session, 
 and emit just one query.  none of the things that can go wrong with caching, 
 such as the multiple result issue you're describing with the bake() recipe, 
 would be exercised by current tests.Memory leaks, subtle changes in 
 queries, all that stuff.It also sort of depends on how the feature comes 
 out, how hard it will be to verify its correctness.

Um... that might make the task a lot bigger than it should be. I'll
have to look into it.

 Otherwise, the bake() recipe as it is can be enhanced or augmented with 
 __hash__() methods and all that but I'm not aware of anything regarding it 
 that would require changes to SQLAlchemy itself, since it uses a Query 
 subclass.

 Well, yeah, I guess so. But that subclass would have to step on all of
 Query methods to be able to compute, cache and update the hash
 (computing it always would be almost as costly as compiling, so it has
 to be cached in an instance attribute). That'd be a chore, and it
 would break every other release.

 Can't a Query generate its hash from its current state, without generative 
 methods being called ? Otherwise, the generative methods do run through a 
 common system, which is the @_generative decorator.

I'm not so familiar with Query internals yet to answer this. But I'll
look into it. I've been thinking, that if caching is conditional on
nothing structural about the query changing, and if we just want to
support that pattern I mentioned above (where you have a global query
object from which you build session-bound ones with with_session),
it could be as cheap as taking the internals' identity as hash. That
wouldn't work for the usual query building patterns, but then again,
when you build a new object, you're already paying a cost similar to
compiling, so caching would only really benefit the case where you
cache the expression externally.

In case I'm not clear, this would not be cached if I were to take id(internals)

query(Blah).filter(blah).join(blah).first()

But I don't care, because that's expensive on its own. This would:

q = query(Blah).filter(blah).join(blah)

...
q2 = q.with_session(S).params(blah).first()


On Thu, May 30, 2013 at 4:05 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 On May 30, 2013, at 2:28 PM, Claudio Freire klaussfre...@gmail.com wrote:

 On Thu, May 30, 2013 at 2:25 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:

 If you want to work on a feature that is actually going to change 
 SQLAlchemy, (and would that be before or after you finish #2720? :) ), it 
 would be:

 After, I didn't forget, just real life real work priorities made me
 veer away from it. Since it was for 0.9, I judged I could safely delay
 2720 a bit while I take care of work related priorities ;-)

 also, I find an overhaul to Query such that it's self-hashing a lot more 
 interesting than #2720.  It would be a much bigger performance savings and it 
 would apply to other interpreters like pypy too.Replacements of tiny 
 sections of code with C, not that interesting :) (redoing all the C in pyrex 
 is more interesting but not necessarily a priority).

The C extension is already done, and I think I sent the latest
version, haven't I?

The only thing remaining of 2720 is turning it all into pyrex code.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Query and compiled_cache

2013-05-30 Thread Claudio Freire

On Thu, May 30, 2013 at 4:19 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 In case I'm not clear, this would not be cached if I were to take 
 id(internals)

 query(Blah).filter(blah).join(blah).first()

 But I don't care, because that's expensive on its own.

 result caching (because we're calling first()) is a whole different thing.   
 the dogpile.cache recipe addresses that - its a widely used recipe and sure, 
 that would make a great feature too, I keep it as a recipe so that people who 
 use it have spent time understanding it.


Oh no, I was referring to the query not the result. Results will never
be cached with what I propose.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Can SQLAlchemy execute multiple select statements in a single round trip?

2013-05-23 Thread Claudio Freire

On Thu, May 23, 2013 at 3:32 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 If not, it would be nice to call .future() or .promise() on instead of a `
 sqlalchemy.orm.query.Query` instance instead of .all() to batch multiple
 queries and have them executed in a single round trip.  The way NHibernate
 works is it will execute all the queries called with .future() when an
 attempt is made to access the results of one of the query's results.  So if
 you've called .future() on 5 queries, but start to access the results from
 the 3 query before .future() was called on the remaining 2 queries, it will
 make 2 round trips.


 its not something DBAPI has consistent support for, a few backends allow
 joining of statements with semicolons like SQL server, but for the most
 prominently used systems like Postgresql and SQLite, it's not generally
 possible.


In postgres, it could be implemented with Async I/O and multiple
cursors, but sadly Async is something of a global pool configuration,
not something you can turn on/off per call.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Can SQLAlchemy execute multiple select statements in a single round trip?

2013-05-23 Thread Claudio Freire

On Thu, May 23, 2013 at 4:21 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 In postgres, it could be implemented with Async I/O and multiple
 cursors, but sadly Async is something of a global pool configuration,
 not something you can turn on/off per call.

 IMHO stuffing async calls and such in an attempt to get two statements to go 
 at once is deeply beyond all lines of diminishing returns :).

If it means making all of SA async, totally. If it could be done only
for those queries, it would be wonderful. But it's not so.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] EAV Optimizations

2013-05-16 Thread Claudio Freire

On Thu, May 16, 2013 at 7:16 PM, Simon King si...@simonking.org.uk wrote:
 I don't think this would be a good match for HSTORE - I assume that 
 retrieving values from HSTORE won't be as efficient as retrieving them from 
 their own rows.

It depends on the number of attributes per test. It won't ever be as
efficient the access itself, but it might compensate the overhead
having one row per attribute. Furthermore, you can create functional
indexes to index over a particularly interesting attribute, and with
partial indexes you can even filter, which means, it could be pretty
fast if there's no need to run arbitrary queries. I don't think trends
over time would be a problem for an HSTORE-based schema, since you'd
already pay the cost of sequential scan anyway. This is all assuming
there's a limited and rather small number of attributes per test
(dozens?). Otherwise EAV is probably better.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] EAV Optimizations

2013-05-14 Thread Claudio Freire

On Tue, May 14, 2013 at 12:38 PM, Michael Bayer
mike...@zzzcomputing.com wrote:
 storing pickles in the database is an awful idea since your data is now 
 specific not just to Python but to the current shape of your object model at 
 a point in time.   Storing JSON is not acceptable for the case where you need 
 to audit and version each user that has made changes to particular keys, as 
 well as a history of those changes.

You could certainly add audit information to the json. Point in case:
json is functionally equivalent to hstore, so if you were considering
hstore, you can also use json.

I agree pickle has its issues, not the least of which is security, but
it's not so specific to Python[0] as it seems, nor is it any more
dependent of object's structure as your EAV is dependent on attribute
semantics. The issues with pickle aren't of that sort, but more of
security and inconvenience (try reading a pickle from a command line
database client and you'll want to shoot yourself).

[0] http://irmen.home.xs4all.nl/pyrolite/

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-05-06 Thread Claudio Freire

On Thu, May 2, 2013 at 3:34 PM, Claudio Freire klaussfre...@gmail.com wrote:
 Without the C extension:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  20811734   27.8290.000   27.8550.000 attributes.py:171(__get__)
   7631984   13.5320.000   31.8510.000 ruby.py:86(get_param)

 With the C extension:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   7631984   19.5140.000   21.0510.000 ruby.py:86(get_param)

 Notice how the C extension saves a total of 10s (cumtime, sum of internal
 and external time).

 There's no DB access when hitting those arguments, as everything has been
 eagerly loaded. It's all function call overhead.

 Assuming an application makes heavy use of attributes, as get_param does
 (expectable of straightforward code I'd think), that's a 30% speedup of
 CPU-bound code.

 As soon as I get GC right I'll post the patch.


So... I got GC right (I think). I had to remove a few lines from
profiles.txt because, obviously, there's a lot less function calls
now.

There's a second patch, that adds __slots__ to instance state. I found
it speeds up things, marginally, but consistently (State.__init__ was
another function weighing a lot because of millions of calls, this is
the only way I found to speed it up).

I'll get around to the Py3 things now.

PS: Sorry I based it on 0.7.10... my app runs on that... I imagine I
could upgrade to 0.8 with little effort, but never got around to
actually doing it.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-05-06 Thread Claudio Freire

On Mon, May 6, 2013 at 1:10 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Thu, May 2, 2013 at 3:34 PM, Claudio Freire klaussfre...@gmail.com wrote:
 Without the C extension:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  20811734   27.8290.000   27.8550.000 attributes.py:171(__get__)
   7631984   13.5320.000   31.8510.000 ruby.py:86(get_param)

 With the C extension:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   7631984   19.5140.000   21.0510.000 ruby.py:86(get_param)

 Notice how the C extension saves a total of 10s (cumtime, sum of internal
 and external time).

 There's no DB access when hitting those arguments, as everything has been
 eagerly loaded. It's all function call overhead.

 Assuming an application makes heavy use of attributes, as get_param does
 (expectable of straightforward code I'd think), that's a 30% speedup of
 CPU-bound code.

 As soon as I get GC right I'll post the patch.


 So... I got GC right (I think). I had to remove a few lines from
 profiles.txt because, obviously, there's a lot less function calls
 now.

 There's a second patch, that adds __slots__ to instance state. I found
 it speeds up things, marginally, but consistently (State.__init__ was
 another function weighing a lot because of millions of calls, this is
 the only way I found to speed it up).

 I'll get around to the Py3 things now.

 PS: Sorry I based it on 0.7.10... my app runs on that... I imagine I
 could upgrade to 0.8 with little effort, but never got around to
 actually doing it.


Stupid me... forgot to attach them.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




SQLAlchemy-0.7.10-cinstrumented.patch
Description: Binary data


SQLAlchemy-0.7.10-slotstate.patch
Description: Binary data

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-05-06 Thread Claudio Freire

On Mon, May 6, 2013 at 1:50 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 that's a lot of effort there.  How confident are you that memory and 
 references are handled correctly in the .c code?

Quite. It's not my first C extension. But, truly, C is complex.

 That's a lot of C code, and it took years for us to iron out all the memory 
 leaks in the existing C extensions that we had - the original author 
 eventually stopped maintaining them, and I had to take it all on myself and 
 spend weeks learning the code and ironing out remaining, subtle issues (like 
 http://hg.sqlalchemy.org/sqlalchemy/rev/8326 and 
 http://hg.sqlalchemy.org/sqlalchemy/rev/8140).   These are very insidious 
 issues as they can't be diagnosed by usual gc reference counting.


There's an answer to those problems that I hesitated proposing, but
you might want to consider: Pyrex. Or Cython. Take your pick. They
*generate* C code, so it's be rather simple to replace the C
extensions with them, and they look a lot more like python, and are a
lot more fool-proof. Really, Pyrex is made for this kind of work. It's
begging you.

It's only the cost of an extra dependency (and the learning curve,
which is there, but far flatter than C's).

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-05-06 Thread Claudio Freire

On Mon, May 6, 2013 at 2:31 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 did you generate your code here with pyrex?If you want to jump in and 
 rework our C extensions to be pyrex based and everything works out just as 
 well or better than before, it'll be a great 0.9/1.0 feature.I've got a 
 bit of experience with cython already as I've worked on lxml a bit, cython 
 vs. pyrex any thoughts ?  based on 
 http://docs.cython.org/src/userguide/pyrex_differences.html they seem pretty 
 similar (though cython seems more commonplace...)

Cython makes a lot more progress, but it's also its drawback at times.
I've sticked to Pyrex when I don't need Cython's benefits, because
Pyrex is far more stable and easier to depend on.

For this kind of work, I'd suggest pyrex. But really both work.

I might try that, after checking Pyrex's compatibility with Py3...
I've never done that.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-05-06 Thread Claudio Freire

On Mon, May 6, 2013 at 2:31 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 did you generate your code here with pyrex?


Oh, sorry, I didn't answer this.

No. I wrote it by hand.

Pyrex-generated code is inscrutable, not that there's any need to
inscrute. But really, when using pyrex, the C file ought to be
considered merely as an intermediate file. The sources are the .pyx
files.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-05-06 Thread Claudio Freire

On Mon, May 6, 2013 at 4:27 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 as for the __slots__ thing, that's a separate issue.if your patch doesn't 
 break tests we can set that for 0.9 as well, I doubt anyone is subclassing 
 InstanceState, though I'd want to see what the speedup is with that.

About 15% on state creation (which can easily be a big chunk of any
bulk ORM operations):

 class InstanceState(object):
...__slots__ = ('a','b','c','__dict__','__weakrefs__')
...def __init__(self):
...self.a = a
...self.b = b
...self.c = c
...
 class InstanceStateSlow(object):
...def __init__(self):
...self.a = a
...self.b = b
...self.c = c
...
 def test(which):
...for i in xrange(10):
...   x = which()
...
 import timeit
 timeit.timeit(lambda : test(InstanceStateSlow))
8.486893892288208
 timeit.timeit(lambda : test(InstanceState))
7.35853814697

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-05-02 Thread Claudio Freire

On Fri, Apr 26, 2013 at 8:59 PM, Michael Bayer mike...@zzzcomputing.com
wrote:
 All attributes have to be expire-able and act as proxies for a
database connection so I'm not really sure where to go with that.I'm
not too thrilled about proposals to build in various alternate
performance behaviors as the library starts to try to act in many
different ways that the vast majority of users aren't even aware of, it
increases complexity internally, produces vast amounts of new use cases to
test and maintain, etc.I'm always willing to look at patches that are
all winning, of course, so if you have some way to speed things up without
breaking usage contracts and without major new complexity/brittleness I'd
love to look at a pull request.

 I know, it's just a probe to see what kind of a speedup could be
 obtained by not having that getter's interference. You know... simply
 implementing InstrumentedAttribute in C could do the trick...


 In fact... I'm gonna try that...

 feel free!  though you might be surprised, a C function that just calls
out to all the same Python operations anyway is often only negligibly
faster, not enough to make the extra complexity worth it.


Ok, I got around to profiling this. The C extension saves 20s from 800s, as
noted before, most of those 800s are SA-unrelated application logic, and
the app has been greatly optimized to avoid attribute access, so average
speedup would most likely be far more.

As expected, the getter disappears from profiles (it's seen as the calling
function's time now). I've got a function that makes some unavoidable
access to instrumented attributes. It's get_params, it's called around 7M
times, so while small, the overhead does add up. I made sure expiration
works btw.

Without the C extension:
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 20811734   27.8290.000   27.8550.000 attributes.py:171(__get__)
  7631984   13.5320.000   31.8510.000 ruby.py:86(get_param)

With the C extension:
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  7631984   19.5140.000   21.0510.000 ruby.py:86(get_param)

Notice how the C extension saves a total of 10s (cumtime, sum of internal
and external time).

There's no DB access when hitting those arguments, as everything has been
eagerly loaded. It's all function call overhead.

Assuming an application makes heavy use of attributes, as get_param does
(expectable of straightforward code I'd think), that's a 30% speedup of
CPU-bound code.

As soon as I get GC right I'll post the patch.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-30 Thread Claudio Freire

On Fri, Apr 26, 2013 at 9:09 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Fri, Apr 26, 2013 at 9:01 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 All attributes have to be expire-able and act as proxies for a database 
 connection so I'm not really sure where to go with that.I'm not too 
 thrilled about proposals to build in various alternate performance 
 behaviors as the library starts to try to act in many different ways 
 that the vast majority of users aren't even aware of, it increases 
 complexity internally, produces vast amounts of new use cases to test 
 and maintain, etc.I'm always willing to look at patches that are all 
 winning, of course, so if you have some way to speed things up without 
 breaking usage contracts and without major new complexity/brittleness 
 I'd love to look at a pull request.

 I know, it's just a probe to see what kind of a speedup could be
 obtained by not having that getter's interference. You know... simply
 implementing InstrumentedAttribute in C could do the trick...


 In fact... I'm gonna try that...

 feel free!  though you might be surprised, a C function that just calls out 
 to all the same Python operations anyway is often only negligibly faster, 
 not enough to make the extra complexity worth it.

 also if you're looking to help with C, I'd love to get the C extensions out 
 in the Py3K version, we have a patch that's fallen out of date at 
 http://www.sqlalchemy.org/trac/ticket/2161 that needs freshening up and 
 testing.

 Will look into that. The point of the C function is to be able to
 quickly bypass all that _supports_population and function call
 overheads. The getter is dead-simple, so its cost is dominated by
 CPython function call overheads, that are readily removable by
 re-implementing in C. It can reliably and quickly detect when
 instance_dict returns __dict__, too.

Alright, I've got a POC C extension working (gotta profile it yet),
although SQLAlchemy's weird injection of instance_dict forced me to
some ugly hacks:


class InstrumentedAttribute(QueryableAttribute):
Class bound instrumented attribute which adds descriptor methods.

def __set__(self, instance, value):
self.impl.set(instance_state(instance),
instance_dict(instance), value, None)

def __delete__(self, instance):
self.impl.delete(instance_state(instance), instance_dict(instance))

try:
from sqlalchemy.cinstrumented import InstrumentedGetter
__get__ = InstrumentedGetter(globals())
__get__.__name__ = '__get__'
del InstrumentedGetter
except ImportError:
def __get__(self, instance, owner):
if instance is None:
return self

dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
return self.impl.get(instance_state(instance),dict_)

Thing is, doing the whole class in C makes no sense for set and
delete, but it also complicates linking its instance_dict and
instance_state to SA.attribute's.

This way looks ugly, but it reacts immediately to changing those
globals, so it does seem like the better option.

Opinions (while I profile)?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-30 Thread Claudio Freire

On Tue, Apr 30, 2013 at 8:06 PM, Michael Bayer mike...@zzzcomputing.com wrote:
try:
from sqlalchemy.cinstrumented import InstrumentedGetter
__get__ = InstrumentedGetter(globals())
__get__.__name__ = '__get__'
del InstrumentedGetter
except ImportError:
def __get__(self, instance, owner):
if instance is None:
return self

dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
return self.impl.get(instance_state(instance),dict_)

 Thing is, doing the whole class in C makes no sense for set and
 delete, but it also complicates linking its instance_dict and
 instance_state to SA.attribute's.

 This way looks ugly, but it reacts immediately to changing those
 globals, so it does seem like the better option.

 Opinions (while I profile)?

 I'd want to see the whole thing, like what's up with that globals() call, 
 etc.   The instance_dict is shuttled around everywhere so that we aren't 
 constantly pulling it from the given object; we have a system in place 
 whereby the fact that instance_dict is object.__dict__ is not necessarily a 
 given, and you can actually use some other system of getting at __dict__.   
 It saved us on a huge number of function calls at some point to expand it out 
 like that, as inconvenient as it is.

I realize that. That globals call is exactly for that.

Well, without going into the C code, the python code does:

dict_ = instance_dict(instance)

That under the hood means:

dict_ = globals()['instance_dict'](instance)

Well, globals there is bound syntactically in python code, but C code
has no globals, and injecting instance_dict into cinstrumented's
module dict sounded like extra complexity for no reason.

Importing attributes from cinstrumented is also no good, since at the
point the InstrumentedGetter is constructed, attribute isn't on
sys.modules, and doing it on invocation would have meant extra
overhead.

So, that globals call is to mimic python's syntactic binding to the
module's global dict, at import time, and be able to query the dict
and find instance_dict no matter how it's modified later.

Afterward, the getter works more or less like this (in C):

def __get__(self, instance, owner):
if instance is None:
   return self
if self.cached_instance_dict is not None \
   and self.cached_instance_dict is instance_dict \
   and self.cached_supports_population \
   and hasattr(instance, '__dict__'):
return instance.__dict__[self.key]
else:
   self.cached_supports_population = self._supports_population
   self.cached_instance_dict = None
   dict_ = instance_dict(instance)
   if dict_ is instance.__dict__:
   self.cached_instance_dict = instance_dict
   return self.impl.get(instance_state(instance), dict_)

Well, in spite of being more complicated, those self.cache_blah things
are really fast since they just compare pointers in C, and, more
importantly, entity.column will invoke this code from CPython's eval
loop (C) directly to the descriptor's getter (C), in no way incurring
python's frame allocation overhead.

I'm attaching the C module in case it clarifies.

I'm not entirely sure about the garbage collection part yet... so it's
not final.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


/*
instrumented.c
Copyright (C) 2013 Claudio Freire klaussfre...@gmail.com

This module is part of SQLAlchemy and is released under
the MIT License: http://www.opensource.org/licenses/mit-license.php
*/

#include Python.h

#if PY_VERSION_HEX  0x0205  !defined(PY_SSIZE_T_MIN)
typedef int Py_ssize_t;
#define PY_SSIZE_T_MAX INT_MAX
#define PY_SSIZE_T_MIN INT_MIN
typedef Py_ssize_t (*lenfunc)(PyObject *);
#define PyInt_FromSsize_t(x) PyInt_FromLong(x)
typedef intargfunc ssizeargfunc;
#endif

#if PY_VERSION_HEX = 0x0300
#define PyString_InternFromString PyUnicode_InternFromString
#endif


PyObject *get_string = NULL;
PyObject *uget_string = NULL;

/***
 * Structs *
 ***/

typedef struct {
PyObject_HEAD

/* Where to get instance_dict from */
PyObject* globals;

/* Name to which it was bound */
PyObject* name;

/* non-reference, just a pointer for identity comparison */
void *cached_instance_dict;

/* Only valid if cached_instance_dict != NULL and equal to global instance_dict */
int cached_supports_population;
} InstrumentedGetter;

/**
 * InstrumentedGetter

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-26 Thread Claudio Freire

On Fri, Apr 26, 2013 at 12:06 PM, Werner werner.bru...@sfr.fr wrote:
 On 26/04/2013 16:41, alonn wrote:

 so not to load too much into memory I should do something like:

 for i in session.query(someobject).filter(idsomething)
 print i

 I'm guessing the answer is no, because of the nature of sql, but I'm not
 an expert so I'm asking.

 yes you can, check out the doc for querying, e.g. the following if you use
 the ORM.

 http://sqlalchemy.readthedocs.org/en/rel_0_8/orm/tutorial.html#querying


Not entirely, if you don't use yield_per (as shown in the docs in
fact, but worth mentioning).

Seeing query:

if self._yield_per:
fetch = cursor.fetchmany(self._yield_per)
if not fetch:
break
else:
fetch = cursor.fetchall()

Not only that, but also all rows are processed and saved to a local
list, so all instances are built and populated way before you get the
first row. That is, unless you specify yield_per.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-26 Thread Claudio Freire

On Fri, Apr 26, 2013 at 12:24 PM, Werner werner.bru...@sfr.fr wrote:
 On 26/04/2013 17:07, Claudio Freire wrote:

 On Fri, Apr 26, 2013 at 12:06 PM, Werner werner.bru...@sfr.fr wrote:

 http://sqlalchemy.readthedocs.org/en/rel_0_8/orm/tutorial.html#querying


 Not entirely, if you don't use yield_per (as shown in the docs in
 fact, but worth mentioning).

 Seeing query:

 if self._yield_per:
  fetch = cursor.fetchmany(self._yield_per)
  if not fetch:
  break
 else:
  fetch = cursor.fetchall()

 Not only that, but also all rows are processed and saved to a local
 list, so all instances are built and populated way before you get the
 first row. That is, unless you specify yield_per.

 Oops, thanks for correcting me.

Um... a tad OT, but looking at that code, there's lots of
opportunities for optimization.

I'll have to profile a bit and let you know.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-26 Thread Claudio Freire

On Fri, Apr 26, 2013 at 1:35 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 Um... a tad OT, but looking at that code, there's lots of
 opportunities for optimization.

 I'll have to profile a bit and let you know.

 are you referring to sqlalchemy/orm/loading.py ?   I'd be pretty impressed if 
 you can find significant optimizations there which don't break usage 
 contracts.I've spent years poring over profiles and squeezing every 
 function call possible out of that system, sometimes producing entirely new 
 approaches that I just had to throw out since they didn't work.   It has been 
 rewritten many times.   Some background on the approach is at 
 http://www.aosabook.org/en/sqlalchemy.html, 20.7. Query and Loading 
 Behavior.


I know... I'm talking micro-optimization. Pre-binding globals in tight
loops, for instance, like:

def filter_fn(x, tuple=tuple, zip=zip):
return tuple(...)

This is of course only worth it for really really hot loops. That's
why I'm profiling. Maybe it's been done already for all the hottest
loops.

Then there's the possibility to replace some list comprehensions with
itertools, which besides not building a temp list, would also run
entirely in C. This also only makes a difference only on very tight,
builtin-laden loops.

I have an app here that really stresses that part of the ORM, so I can
profile rather easily. In previous profiles, I remember seeing
Query.instances near the top, and all the optimizations I mentioned
above could be applied there, if they make any difference I'll tell.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-26 Thread Claudio Freire

On Fri, Apr 26, 2013 at 2:04 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 are you referring to sqlalchemy/orm/loading.py ?   I'd be pretty impressed 
 if you can find significant optimizations there which don't break usage 
 contracts.I've spent years poring over profiles and squeezing every 
 function call possible out of that system, sometimes producing entirely new 
 approaches that I just had to throw out since they didn't work.   It has 
 been rewritten many times.   Some background on the approach is at 
 http://www.aosabook.org/en/sqlalchemy.html, 20.7. Query and Loading 
 Behavior.


 I know... I'm talking micro-optimization. Pre-binding globals in tight
 loops, for instance, like:

 def filter_fn(x, tuple=tuple, zip=zip):
return tuple(...)

 This is of course only worth it for really really hot loops. That's
 why I'm profiling. Maybe it's been done already for all the hottest
 loops.

 Then there's the possibility to replace some list comprehensions with
 itertools, which besides not building a temp list, would also run
 entirely in C. This also only makes a difference only on very tight,
 builtin-laden loops.

 I have an app here that really stresses that part of the ORM, so I can
 profile rather easily. In previous profiles, I remember seeing
 Query.instances near the top, and all the optimizations I mentioned
 above could be applied there, if they make any difference I'll tell.

 the real bottleneck in loading is the loading.instances() function.  I have 
 tried for years to reduce overhead in it.  Writing it in C would be best, but 
 then again Pypy aims to solve the problem of FN overhead, pre-binding, and 
 such.   I don't want to work against Pypy too much.

That makes the proposition tricky. I don't know PyPy's performance
characteristics that well. I assume pre-binding wouldn't hurt PyPy
much, since loop traces would be nearly the same, but I've never
tested.

Pre-binding in filter_fn improves its runtime ten-fold. Actually,
pre-binding and replacing tuple(genexpr) by tuple([compexpr]), since
genexprs are rather slow compared to list compehensions. The
improvement accounts for 1% of my test's runtime, so if it hurts PyPy,
it might not be so great an optimization (if it doesn't, though, it's
a very cheap one, and it could be applicable in other places). This
particular one helps in the case of query(Column, Column, Column),
which I use a lot.

Note, however, that my test is 40% waiting on the DB, so CPU usage
impact would be proportionally bigger, especially with parallel
workers (I'm using just one thread when profiling though).

Doing those small optimizations to WeakIdentityMap (another one whose
methods are called an obscenely large amount of times), I get about
10% speedup on those. I imagine that could count in some situations.

Ultimately, though, it's InstrumentedAttribute.__get__ the one sucking
up 30% of alchemy-bound CPU time. I guess there's little that can be
done, since it's necessary to track state changes. But there's a neat
property of descriptors, where if they don't implement __get__, then
they don't take precedence over the instance's dict.

This is very interesting, and handy, since when instance_dict is
attrgetter('__dict__'), then, for regular ColumnPropertys, instead of
using InstrumentedAttribute, I can replace that with an
InstrumentedWriteAttribute that has no get. This means, all of a
sudden, no overhead for simple attribute access.

I've tested it and it mostly works. There's the instance_dict is
attrgetter('__dict__') thing hanging over my head, and the more
serious issue of lazy attributes being mostly broken, but it's an
interesting POC IMHO.

Anyway, with that (fragile) change, I get a speedup of 10% overall
runtime, and about 50% alchemy-specific runtime. Considering I knew
about attribute access' slowness and avoided it in my test, that has
to account for something worth looking into? (before optimizing for
attribute access slowness, the test was about 3 times slower IIRC -
*times* - and it does a hefty amount of regex processing beyond
handling attributes)

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-26 Thread Claudio Freire

On Fri, Apr 26, 2013 at 8:15 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 Anyway, with that (fragile) change, I get a speedup of 10% overall
 runtime, and about 50% alchemy-specific runtime. Considering I knew
 about attribute access' slowness and avoided it in my test, that has
 to account for something worth looking into?

 All attributes have to be expire-able and act as proxies for a database 
 connection so I'm not really sure where to go with that.I'm not too 
 thrilled about proposals to build in various alternate performance 
 behaviors as the library starts to try to act in many different ways that the 
 vast majority of users aren't even aware of, it increases complexity 
 internally, produces vast amounts of new use cases to test and maintain, etc. 
I'm always willing to look at patches that are all winning, of course, so 
 if you have some way to speed things up without breaking usage contracts and 
 without major new complexity/brittleness I'd love to look at a pull request.

I know, it's just a probe to see what kind of a speedup could be
obtained by not having that getter's interference. You know... simply
implementing InstrumentedAttribute in C could do the trick...

 (before optimizing for
 attribute access slowness, the test was about 3 times slower IIRC -
 *times* - and it does a hefty amount of regex processing beyond
 handling attributes)

 Im not sure what regexes you're referring to here.

Oh, it's just application-specific regexes. The point was that there's
a lot of application-specific processing, so the speedup must be big
to be observable through the interference.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-26 Thread Claudio Freire

On Fri, Apr 26, 2013 at 8:47 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Fri, Apr 26, 2013 at 8:15 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:

 Anyway, with that (fragile) change, I get a speedup of 10% overall
 runtime, and about 50% alchemy-specific runtime. Considering I knew
 about attribute access' slowness and avoided it in my test, that has
 to account for something worth looking into?

 All attributes have to be expire-able and act as proxies for a database 
 connection so I'm not really sure where to go with that.I'm not too 
 thrilled about proposals to build in various alternate performance 
 behaviors as the library starts to try to act in many different ways that 
 the vast majority of users aren't even aware of, it increases complexity 
 internally, produces vast amounts of new use cases to test and maintain, 
 etc.I'm always willing to look at patches that are all winning, of 
 course, so if you have some way to speed things up without breaking usage 
 contracts and without major new complexity/brittleness I'd love to look at a 
 pull request.

 I know, it's just a probe to see what kind of a speedup could be
 obtained by not having that getter's interference. You know... simply
 implementing InstrumentedAttribute in C could do the trick...


In fact... I'm gonna try that...

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Are sqlalchemy queries a generator?

2013-04-26 Thread Claudio Freire

On Fri, Apr 26, 2013 at 9:01 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 All attributes have to be expire-able and act as proxies for a database 
 connection so I'm not really sure where to go with that.I'm not too 
 thrilled about proposals to build in various alternate performance 
 behaviors as the library starts to try to act in many different ways that 
 the vast majority of users aren't even aware of, it increases complexity 
 internally, produces vast amounts of new use cases to test and maintain, 
 etc.I'm always willing to look at patches that are all winning, of 
 course, so if you have some way to speed things up without breaking usage 
 contracts and without major new complexity/brittleness I'd love to look 
 at a pull request.

 I know, it's just a probe to see what kind of a speedup could be
 obtained by not having that getter's interference. You know... simply
 implementing InstrumentedAttribute in C could do the trick...


 In fact... I'm gonna try that...

 feel free!  though you might be surprised, a C function that just calls out 
 to all the same Python operations anyway is often only negligibly faster, 
 not enough to make the extra complexity worth it.

 also if you're looking to help with C, I'd love to get the C extensions out 
 in the Py3K version, we have a patch that's fallen out of date at 
 http://www.sqlalchemy.org/trac/ticket/2161 that needs freshening up and 
 testing.

Will look into that. The point of the C function is to be able to
quickly bypass all that _supports_population and function call
overheads. The getter is dead-simple, so its cost is dominated by
CPython function call overheads, that are readily removable by
re-implementing in C. It can reliably and quickly detect when
instance_dict returns __dict__, too.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Doing reflection with multiple remote databases.

2013-01-16 Thread Claudio Freire

On Tue, Jan 15, 2013 at 8:48 PM, Hetii ghet...@gmail.com wrote:
 Even when i dump all of them into declarative base model, its still huge
 amount of data that need to be parsed and loaded.

 I want to ask if its possible to share table/column definition across
 different database models to reduce amount of used resources?

Even if you can't share the objects themselves (not sure you can, you
probably can't), you can share the code that generates them.

Remember python is dynamic, and class blah is code that actually
creates a class object:

def init():
   class Blah
   class Blah
   return locals()

globals().update(init())

^
You can call that init function as many times as you want. I use
something like that to map two identical databases (master and
replica) into two namespaces I can pick depending on the task.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] [Q] Lack of isolation in unit tests

2012-09-18 Thread Claudio Freire

On Tue, Sep 18, 2012 at 11:07 AM, Ladislav Lenart lenart...@volny.cz wrote:
 def create_base():
 return declarative_base(cls=_Base)

Move the declaration of _Base to within create_base, and I think that
should fix your problem.

(I've had a similar one, not with test cases, but with a replica schema)

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Another Parent instance is not bound to a Session; lazy load...

2012-05-31 Thread Claudio Freire

On Thu, May 31, 2012 at 12:50 AM, Michael Bayer
mike...@zzzcomputing.com wrote:

 Thing is, in order to work with a large volume of objects, you're
 forced to do this, otherwise the session can grow uncontrollably.

 flush periodically, and don't maintain references to things you're done with. 
  The Session does not strongly reference objects that have no pending 
 changes, and they'll be garbage collected.

Problem is, I'm stuck with strongly-referencing sessions. The app
comes from SA 0.3, and is heavily relying on the session as a kind of
L1 cache - removing that assumption is a really huge task we haven't
gotten to. We managed to upgrade it to SA 0.5, but we kept
strongly-referencing sessions.

 The problems you're having are from unnecessary detachment of objects, from 
 calling Session.close() and continuing to work with objects that have lost 
 their owning Session, within the context of a new Session they have no 
 association with.

I've been solving those problems by reattaching objects to the
session. Only with caches I haven't been able to do that, since cached
objects will be used by many threads at once, so no single session can
own them.

 Another case in which an object's lifespan can exceed the session's,
 is when you want to implement caching with objects of your data model
 - cached values will have come from other sessions than the current
 one, and things get horribly messy.

 There are documented patterns for caching - see the example in 
 examples/beaker_caching in the distro.   This pattern is designed to cleanly 
 handle the pattern of detached objects becoming re-associated with a 
 particular session at once.   The pattern is along the lines of, session is 
 created to work with a field of objects, a set of objects is retrieved from 
 the cache, then re-associated with the cache en-masse using the 
 merge_result() method illustrated in the example.

Interesting, I hadn't seen that example. Bookmarked already :-)

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Another Parent instance is not bound to a Session; lazy load...

2012-05-30 Thread Claudio Freire

On Wed, May 30, 2012 at 8:39 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 well yes, the way you're doing this is entirely the opposite of how the ORM
 is designed to function.    The Session has been developed in order to work
 in an intelligent manner with full graphs of interrelated objects, all
 coordinated under the umbrella of a transaction which applies atomicity and
 isolation to the full series of operations.    An individual object loaded
 from a Session is in fact an extension of that Session's state, which is in
 turn an extension of the state of the current transaction within the
 database, all kept in sync mostly automatically.   When you continuously
 break this connection between a whole series of interconnected objects
 you'll run into not just lots of problems keeping objects associated with
 continuously new transactions, but also horrendous performance from due to
 the excessive number of commits and re-acquisition of new
 connections/transactions.

 In general, the Session is typically used in a bounding sense, its
 lifespan beginning before you work with any objects, and ending only after
 you've completed the work with those objects.    I frequently use the
 metaphor here that the Session is the table setting and the objects are the
 meal.

Thing is, in order to work with a large volume of objects, you're
forced to do this, otherwise the session can grow uncontrollably. When
you separate the operation to work in batches, you almost always have
some objects that have a lifespan larger than a single batch, and then
a single session.

Another case in which an object's lifespan can exceed the session's,
is when you want to implement caching with objects of your data model
- cached values will have come from other sessions than the current
one, and things get horribly messy.

And, finally, the long transaction pattern, which luckily is a lot
easier to solve. If the OP is using that pattern, it's just a matter
of reattaching detached objects to the session.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Understanding memory usage under SQLA

2012-05-17 Thread Claudio Freire

On Thu, May 17, 2012 at 10:21 AM, Michael Bayer
mike...@zzzcomputing.com wrote:
There's a few different parts to what you're asking.

The first is that you're comparing Python's use of OS memory (I'm assuming
this is the 200+ MB) to Python's actual amount of objects present. This is
a common mistake. Python up through version 2.6 does not release memory
back to the OS once taken - this was improved in 2.7. There's an old
article about this here:
http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
as well as Alex Martelli's answer: http://stackoverflow.com/a/1316799/34549

I think you're mistaken about the version number in which that was changed.
The article you pointed to (and my memory) seems to say it was on 2.5, not 2.7

Second is, what exactly is the large object you're creating here ? Answer
- first, psycopg2 by default buffers the result set fully before returning
it to SQLAlchemy - so it is first a list of 5000 tuples. Second, the ORM
itself also by default buffers the full set of rows from the result set in
the form of mapped objects, so 5000 objects plus their related objects. A
way to modify this behavior is to use the yield_per() option of Query, which
will also in the case of psycopg2 tell psycopg2 to use its server side
cursors feature which does not buffer.

It's important to notice that all that you mention on the py side
would show up on heapy.
But psycopg2's buffers would not. Nor would libpq's.

On May 17, 2012, at 7:33 AM, Vlad K. wrote:
The following is source of an example case. Requires SQLAlchemy (tested with
0.7.5 and 0.7.7), guppy, psycopg2 (tested with 2.4.2 and 2.4.4). Happens
both on Fedora 15 64-bit and CentOS 6.2 32-bit, though of course the 32-bit
shows some 30% lower RES in top.

Update to psycopg2 2.4.5, it fixes some memory leaks (they call them
reference counting problems):
http://initd.org/psycopg/articles/2012/03/29/psycopg-245-released/

Also... which version of libpq did you build psycopg2 with?

I don't see where and how would any objects remain in memory, and heapy
showing much lower memory use suggests something is retained in the involved
C extensions?

C extensions, or memory fragmentation.

If you read spanish, check this out:
http://python.org.ar/pyar/Charlas#Depuraci.2BAPM-n_y_defragmentaci.2BAPM-n_de_memoria_en_Python

If not... ehm... try google translate ;-)

Heapy before selecting rows:
Heapy after 5000 rows have been selected:

Try the difference for more insight:

h1 = hpy.heap()
# do some
h2 = hpy.heap()
print repr(h2 - h1)

--
You received this message because you are subscribed to the Google Groups
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Understanding memory usage under SQLA

2012-05-17 Thread Claudio Freire

On Thu, May 17, 2012 at 11:57 AM, Vlad K. v...@haronmedia.com wrote:
 PostgreSQL 9.0.7, libpq is part of the same version -devel package
 PostgreSQL 9.1.3 (via pgsql yum repo), libpq is part of the same version
 -devel package

9.1 (the one you have in production) also has leak-related fixes. They
don't necessarily apply to your problem, they're somewhat esoteric,
but it's worth noticing.

Even the tiniest memory leak will fragment your process' heap, and
cause it not to release those 200M.

In those cases, especially when the leaks are one-time (like in the
libpq case), it's convenient to force the leak to happen at load time.

Precompiling queries in SQLA, to populate the various SQLA's compiler
caches, doing some queries that cause libpq and psycopg2 to excercise
(and thus to allocate whatever permanent data structures it needs to),
all at load time, will help keep fragmentation to a minimum.

Fragmentation is a complex issue, and both python and SQLA are quite
prone to it. But it can be worked around.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Mapper compilation errors in multi threaded web application using dynamic mapping to selects

2012-05-09 Thread Claudio Freire

On Wed, May 9, 2012 at 4:11 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 Hello,

 is there a problem in mapping classes to selects ([1]) /within a function/?

 with multiple threads, where the mappers initialization may first proceed as 
 the product of a thread running, yes.     you'd want to upgrade to 0.7 for 
 the best versions of these fixes, or at least 0.6.

 If you must stay on 0.5, make sure all modules are fully imported, then run 
 compile_mappers() before starting any threads.

Sweet.

Didn't know that one.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] QueuePool limit size reached, using expression API only

2012-04-09 Thread Claudio Freire

On Mon, Apr 9, 2012 at 4:03 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 close your connections after you are finished with them.

They should be automatically returned to the pool when unreferenced.

The OP may be storing stray references somewhere, or associating them
somehow to a reference cycle that takes time to be freed.

In any case, explicit closing may not be the greatest idea (that
connection won't go back to the pool I think, not sure, please SA
gurus confirm), rather, they should be de-referenced thoroughly.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] QueuePool limit size reached, using expression API only

2012-04-09 Thread Claudio Freire

On Mon, Apr 9, 2012 at 4:36 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 Using context managers, i.e. with engine.connect() as conn, is the most 
 straightforward.

IIRC, context managers are new in SA, aren't they?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] QueuePool limit size reached, using expression API only

2012-04-09 Thread Claudio Freire

On Mon, Apr 9, 2012 at 6:50 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 better one to use from engine is begin() (also new in 0.7.6):

 with engine.begin() as conn:
    ...

 that way everything you do with conn is on the same transaction.

Yeah, because I'm using 0.5.8 (and couldn't switch to 0.6.x yet, the
app breaks with it).

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Reducing instrumentation overhead for read-only entities

2012-03-05 Thread Claudio Freire

On Sun, Feb 19, 2012 at 6:27 AM, Andrey Popp 8may...@gmail.com wrote:
 I've managed quite efficient inlining with bytecode magic. It's especially
 effective with SQLA code, since it also specializes the inlined function,
 removing a lot of dead code (in the call context).

 That's pretty interesting!

 I could share the code if you're interested, but it is rather hard to
 maintain (it's tied to the bytecode, which is version-specific in CPython)
 and it's experimental code, so it's rather ugly.

 Yeah, that would be great, thanks!

Finally, I got access to it.

So, this[0] is the core code. The easiest way to invoke it is to
install it as an import hook (see the function install), that will
process all imports from then on.

The code is only tested in 2.6, and has issues with 2.7, not to
mention 3.x. Might work with 2.5/2.4 though.

[0] http://pastebin.com/9TN6zJKc

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] postgres with .. select queries

2012-03-01 Thread Claudio Freire

On Thu, Mar 1, 2012 at 12:11 PM, A.M. age...@themactionfaction.com wrote:
 Well, the SQL standard points at WITH RECURSIVE which is more general anyway. 
 W.R. is basically an inductive query loop construct (base case UNION 
 induction step) where CONNECT BY only handles key-based tree retrieval, no?

 Also, basic WITH support (without RECURSIVE) would be much appreciated- that 
 could offer more flexibility than FROM-subqueries and could open the door for 
 W.R.

Also, CTE support (recursive or not) would be easily implementable by
a special case of selectable, whose visitor only outputs the name, and
prepends to the string the WITH blah AS bleh.


import sqlalchemy as sa
somestuff = sa.select(...).cte(somestuff)
somequery = sa.select( somestuff.c.somecolumn, somestuff.c.someval == 3)

Just to give an example.

I think this form makes a lot more sense in the SQLA API than the
context manager. This form can be joined and operated on like any
other table, and it could even be recursive if proper care is taken
and generative API is used:

somestuff = sa.select(...).cte(somestuff)
somestuff.append_from(somestuff.recurse)
somestuff.append_whereclause(somestuff.c.id == somestuff.recurse.c.parent_id)

Just a possibility.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] postgres with .. select queries

2012-03-01 Thread Claudio Freire

On Thu, Mar 1, 2012 at 5:08 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 OK also, this is ready to go in from my perspective, I don't make usage of 
 CTEs in my normal work at the moment, so hopefully those people here 
 interested in the feature can give this a review, maybe even try the patch, 
 because this will be it !

Why is joining not supported in that patch?

I would imagine it should be possible, with explicit join conditions of course.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Sequences, Primary Keys, Relations

2012-02-23 Thread Claudio Freire

On Thu, Feb 23, 2012 at 4:49 PM, Adam Tauno Williams
awill...@whitemice.org wrote:
 This works fine.  But if I create a Project object I can't relate it to
 a ProjectInfo object within the same transaction without calling flush()
 first.  Is there some way to encourage SQLalchemy to allocate a value
 from the sequence when the object is created?

If you just use the relation (project.info = project_info), alchemy
will do everything for you.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

2012-02-22 Thread Claudio Freire

On Wed, Feb 22, 2012 at 4:29 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 thanks for your reply. I haven't yet tested this with a profiler to see 
 exactly what exactly is happening, but the bottom line is that the overall 
 memory use grows with each iteration (or transaction processed), to the 
 point of grinding the server to a halt, and top shows only the Python 
 process involved consuming all the memory.

 yeah like I said that tells you almost nothing until you start looking at 
 gc.get_objects().  If the size of gc.get_objects() grows continuously for 50 
 iterations or more, never decreasing even when gc.collect() is called, then 
 it's a leak.  Otherwise it's just too much data being loaded at once.

I've noticed compiling queries (either explicitly or implicitly) tends
to *fragment* memory. There seem to be long-lived caches in the PG
compiler at least. I can't remember exactly where, but I could take
another look.

I'm talking of rather old versions of SQLA, 0.3 and 0.5.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

2012-02-22 Thread Claudio Freire

On Wed, Feb 22, 2012 at 5:40 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 Saw that a bit, but looking at the tips at the bottom, concrete 
 implementation changes are not coming to mind.   An eternal structure is 
 ubiquitous in any programming language.  sys.modules is a big list of all the 
 Python modules that have been imported, each one full of functions, classes, 
 other data, these are all eternal structures - sys.modules is normally 
 never cleaned out.    I'm not seeing at what point you move beyond things 
 that are in these modules into things that are so-called eternal structures 
 that lead to inappropriate memory fragmentation.

The thing to be careful about is when those eternal structures are created.

If they're created at the beginning (as sys.modules, which is
populated with imports, which most of the time happen in the preamble
of .py files), then the resulting objects will have lower memory
locations and thus not get in the way.

But if those structures are created after the program had time to fill
its address space with transient objects (say, lazy imports, caches),
then when the transient objects are deleted, the eternal structures
(with their high addresses) prevent the heap from shrinking.

Such caches, for instance, are better made limited in lifespan (say,
giving them a finite lifetime, making them expire, actively cleaning
them from time to time). Structures that are truly required to be
eternal are better populated at load time, early in the program's
lifecycle. In my backend, for instance, queries are precompiled at
startup, to make sure they have lower memory addresses. This has
mostly solved SQLA-related memory fragmentation issues for me.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

2012-02-22 Thread Claudio Freire

On Wed, Feb 22, 2012 at 5:51 PM, Claudio Freire klaussfre...@gmail.com wrote:
 Such caches, for instance, are better made limited in lifespan (say,
 giving them a finite lifetime, making them expire, actively cleaning
 them from time to time). Structures that are truly required to be
 eternal are better populated at load time, early in the program's
 lifecycle. In my backend, for instance, queries are precompiled at
 startup, to make sure they have lower memory addresses. This has
 mostly solved SQLA-related memory fragmentation issues for me.

One source of trouble I've had here, is the inability to use bind
parameters inside .in_(...).

Queries that accept variable lists, thus, I had to precompile to
string, and replace the inside of the condition by string
interpolation.

Ugly hack, but it served me well.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

2012-02-22 Thread Claudio Freire

On Wed, Feb 22, 2012 at 6:21 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 IMHO the whole point of using a high level, interpreted language like Python 
 is that we don't have to be bogged down thinking like C programmers.   How 
 come I've never had a memory fragmentation issue before ?      I've made 
 precompilation an option for folks who really wanted it but I've never had 
 a need for such a thing.   And you can be sure I work on some very large and 
 sprawling SQLAlchemy models these days.

Maybe you never used big objects.

Memory fragmentation arises only when the application handles a
mixture of big and small objects, such that holes created by small
objects being freed don't serve big memory requirements.

If your application handles a homogenous workload (ie: every request
is pretty much the same), as is usual, then you won't probably
experience fragmentation.

My application does the usual small-object work, interspersed with
intense computation on big objects, hence my troubles.

Python's garbage collector has been a pending issue for a long time,
but, as I noticed in the linked page, past architectural decisions
prevent some widely desired improvements.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] DetachedInstanceError after transaction

2012-02-14 Thread Claudio Freire

On Tue, Feb 14, 2012 at 6:48 PM, Jonathan Vanasco jonat...@findmeon.com wrote:
 I will admit that i have some janky db sessioning stuff going on
 behind the scenes as I get used to pyramid and the new sqlalchemy.

Then I'd say the janky db sessioning stuff going on behind the
scenes is closing your session after the commit.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: DetachedInstanceError after transaction

2012-02-14 Thread Claudio Freire

On Tue, Feb 14, 2012 at 7:19 PM, Jonathan Vanasco jonat...@findmeon.com wrote:
 could anyone point in the right direction to either:

 1. rebind an object to a new session
 or
 2. allow objects to still be 'read' in the detached state ?

Eric said it best: replace commit with DBSession.flush()

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

[sqlalchemy] PDF documentation 0.5

2012-02-09 Thread Claudio Freire

Hi there.

I just tried to download the PDF documentation for 0.5 from the
documentation page[0], and the link seems to be broken (404).

Sorry if it has been brought to the list already, first time post
here, I checked the archive and couldn't find anything related.

[0] http://docs.sqlalchemy.org/en/rel_0_5/index.html

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] PDF documentation 0.5

2012-02-09 Thread Claudio Freire

On Thu, Feb 9, 2012 at 1:28 PM, Michael Bayer mike...@zzzcomputing.com wrote:
 I've copied all the existing PDFs to my host here: 
 http://www.sqlalchemy.org/doc_pdfs/ you can yank it from there.

Got it.

Thanks :-)

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

97 matches

Mail list logo