Re: [sqlalchemy] Windowed Queries breaking after a commit and emitting many, many Selects.

2014-10-27 Thread James Meneghello
Using a scoped session with a session generator and I didn't want 
expire_on_commit to be False for everything, so setting it using the 
Session constructor wouldn't work properly. If a session was created prior 
to the one that needed that flag, it'd give me a ProtocolError since it 
couldn't change the session after it'd already been created. Manually 
setting the expire_on_commit attribute in the session and setting it back 
after it was done worked fine, though, and didn't mess with the scoped 
session pool:

with db_session() as db:
db.expire_on_commit = False
# do stuff
db.expire_on_commit = True

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Windowed Queries breaking after a commit and emitting many, many Selects.

2014-10-26 Thread James Meneghello
The application I'm working on operates over extremely large datasets, so 
I'm using the query windowing from here 
(https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery) 
to break it into manageable chunks. The query window is usually around 10k 
rows, after which it updates/deletes some rows and continues on. Simple 
breakdown is like this:

query = session.query(Item).filter(...several filters)
total_items = query.count() # used for logging
for row in windowed_query(query, Item.id, 1):
count += 1

# process, determine whether to keep (and update) or delete (put in a 
list for batch-deletion)
# one such example is:
if row.group_name != regex.group_name:
continue

if count = 1:
save(items) # items to be kept, issues updates
deleted = 
db.query(Item).filter(Item.id.in_(dead_items)).delete(synchronize_session='fetch')
session.commit()
count = 0

This works fine until it's gone through a save/delete cycle. Once it's 
saved, it goes back to access the windowed query again and pull the next 
10k rows. This works until the following line:

if row.group_name != regex.group_name:

At which point sqla will emit a SELECT for the item of that specific ID, 
presumably because the group_name wasn't available and it had to fetch it. 
This only occurs after the commit - so I assume that committing the session 
is breaking the query. Hence, for the next 10k rows, it emits 10k queries 
(one per row).

Because the script is potentially processing so many rows, I don't want to 
let the dead_items list grow to be massive, so the deletes need to occur 
fairly regularly throughout the process.

Any idea what's causing this / how to fix it? Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Windowed Queries breaking after a commit and emitting many, many Selects.

2014-10-26 Thread Michael Bayer

 On Oct 26, 2014, at 5:25 AM, James Meneghello murod...@gmail.com wrote:
 
 The application I'm working on operates over extremely large datasets, so I'm 
 using the query windowing from here 
 (https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery)
  to break it into manageable chunks. The query window is usually around 10k 
 rows, after which it updates/deletes some rows and continues on. Simple 
 breakdown is like this:
 
 query = session.query(Item).filter(...several filters)
 total_items = query.count() # used for logging
 for row in windowed_query(query, Item.id, 1):
 count += 1
 
 # process, determine whether to keep (and update) or delete (put in a 
 list for batch-deletion)
 # one such example is:
 if row.group_name != regex.group_name:
 continue
 
 if count = 1:
 save(items) # items to be kept, issues updates
 deleted = 
 db.query(Item).filter(Item.id.in_(dead_items)).delete(synchronize_session='fetch')
 session.commit()
 count = 0
 
 This works fine until it's gone through a save/delete cycle. Once it's saved, 
 it goes back to access the windowed query again and pull the next 10k rows. 
 This works until the following line:
 
 if row.group_name != regex.group_name:
 
 At which point sqla will emit a SELECT for the item of that specific ID, 
 presumably because the group_name wasn't available and it had to fetch it. 
 This only occurs after the commit - so I assume that committing the session 
 is breaking the query. Hence, for the next 10k rows, it emits 10k queries 
 (one per row).

commit() will expire everything in the Session by default, meaning all the 
attributes on objects gets cleared, and will emit a SELECT on next access.   
seems like you definitely want to disable this, which is the expire_on_commit 
setting of the Session itself; or otherwise, rearrange the logic here such that 
when you call session.commit(), you are definitely done with the objects in 
that batch of 1.   Usually with batching, things are organized this way so 
that you only have the 10K objects at most at once in memory; the commit() 
closes out dealing with those 10K objects.   In this case it seems like you are 
accessing the full range of 10K objects after that commit().  If it can’t be 
avoided then you’d have to create a Session with expire_on_commit=False, right 
now that setting is per-Session only.   it may be possible to do it just for a 
range if you just set the attribute on the Session also but I haven’t tried 
that.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.