[sqlalchemy] Windowed Queries breaking after a commit and emitting many, many Selects.

2014-10-26 Thread James Meneghello
The application I'm working on operates over extremely large datasets, so 
I'm using the query windowing from here 
(https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery) 
to break it into manageable chunks. The query window is usually around 10k 
rows, after which it updates/deletes some rows and continues on. Simple 
breakdown is like this:

query = session.query(Item).filter(...several filters)
total_items = query.count() # used for logging
for row in windowed_query(query, Item.id, 1):
count += 1

# process, determine whether to keep (and update) or delete (put in a 
list for batch-deletion)
# one such example is:
if row.group_name != regex.group_name:
continue

if count = 1:
save(items) # items to be kept, issues updates
deleted = 
db.query(Item).filter(Item.id.in_(dead_items)).delete(synchronize_session='fetch')
session.commit()
count = 0

This works fine until it's gone through a save/delete cycle. Once it's 
saved, it goes back to access the windowed query again and pull the next 
10k rows. This works until the following line:

if row.group_name != regex.group_name:

At which point sqla will emit a SELECT for the item of that specific ID, 
presumably because the group_name wasn't available and it had to fetch it. 
This only occurs after the commit - so I assume that committing the session 
is breaking the query. Hence, for the next 10k rows, it emits 10k queries 
(one per row).

Because the script is potentially processing so many rows, I don't want to 
let the dead_items list grow to be massive, so the deletes need to occur 
fairly regularly throughout the process.

Any idea what's causing this / how to fix it? Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Windowed Queries breaking after a commit and emitting many, many Selects.

2014-10-26 Thread Michael Bayer

 On Oct 26, 2014, at 5:25 AM, James Meneghello murod...@gmail.com wrote:
 
 The application I'm working on operates over extremely large datasets, so I'm 
 using the query windowing from here 
 (https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery)
  to break it into manageable chunks. The query window is usually around 10k 
 rows, after which it updates/deletes some rows and continues on. Simple 
 breakdown is like this:
 
 query = session.query(Item).filter(...several filters)
 total_items = query.count() # used for logging
 for row in windowed_query(query, Item.id, 1):
 count += 1
 
 # process, determine whether to keep (and update) or delete (put in a 
 list for batch-deletion)
 # one such example is:
 if row.group_name != regex.group_name:
 continue
 
 if count = 1:
 save(items) # items to be kept, issues updates
 deleted = 
 db.query(Item).filter(Item.id.in_(dead_items)).delete(synchronize_session='fetch')
 session.commit()
 count = 0
 
 This works fine until it's gone through a save/delete cycle. Once it's saved, 
 it goes back to access the windowed query again and pull the next 10k rows. 
 This works until the following line:
 
 if row.group_name != regex.group_name:
 
 At which point sqla will emit a SELECT for the item of that specific ID, 
 presumably because the group_name wasn't available and it had to fetch it. 
 This only occurs after the commit - so I assume that committing the session 
 is breaking the query. Hence, for the next 10k rows, it emits 10k queries 
 (one per row).

commit() will expire everything in the Session by default, meaning all the 
attributes on objects gets cleared, and will emit a SELECT on next access.   
seems like you definitely want to disable this, which is the expire_on_commit 
setting of the Session itself; or otherwise, rearrange the logic here such that 
when you call session.commit(), you are definitely done with the objects in 
that batch of 1.   Usually with batching, things are organized this way so 
that you only have the 10K objects at most at once in memory; the commit() 
closes out dealing with those 10K objects.   In this case it seems like you are 
accessing the full range of 10K objects after that commit().  If it can’t be 
avoided then you’d have to create a Session with expire_on_commit=False, right 
now that setting is per-Session only.   it may be possible to do it just for a 
range if you just set the attribute on the Session also but I haven’t tried 
that.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Eager Loading of AssociationProxy (Generic Association with Discriminator on Association)

2014-10-26 Thread Michael Bayer

 On Oct 26, 2014, at 12:07 AM, Victor Reichert vfr...@gmail.com wrote:
 
 Hi,
 
 I am following the Generic Association with Discriminator on Association 
 example at:
 
 http://docs.sqlalchemy.org/en/latest/_modules/examples/generic_associations/discriminator_on_association.html
  
 http://docs.sqlalchemy.org/en/latest/_modules/examples/generic_associations/discriminator_on_association.html
 
 However, I would like to eager load the customer.addresses in a query like
 
 eager_sales_persons = 
 session.query(SalesPerson).options(joinedload(SalesPerson.customers).joinedload(Customer.addresses))
 
 #with SalesPerson being a class I added with a relationship to customers.
 
 However, the above statement raises:  'AssociationProxy' object has no 
 attribute 'property'
 
 I tried eager_sales_persons = 
 session.query(SalesPerson).options(joinedload(SalesPerson.customers).joinedload(Customer.address_association).joinedload(AddressAssociation.addresses)).all()
 
 however, it would emit SQL for customer.addresses.
 
 I have made a pastie with my code at: http://pastie.org/9676017 
 http://pastie.org/9676017
 Is there a loader strategy that would work for my situation or a work around 
 for the AssociationProxy error?
 
 

the second query is the right one, as currently there isn’t integration between 
an association proxy attribute and loader options, meaning, you have to state 
the joinedload() in terms of the actual relationship() as you are doing.

However I’m not seeing the problem:

if we load as :


session.query(SalesPerson).options(joinedload(SalesPerson.customers).joinedload(Customer.address_association).joinedload(AddressAssociation.addresses)).all()

which you can also state like this:


session.query(SalesPerson).options(joinedload(SalesPerson.customers).joinedload(address_association).joinedload(“addresses)).all()

setting echo=True on create_engine(), the main query is:

SELECT salesperson.id AS salesperson_id, salesperson.name AS salesperson_name, 
address_association_1.id AS address_association_1_id, 
address_association_1.discriminator AS address_association_1_discriminator, 
address_1.id AS address_1_id, address_1.association_id AS 
address_1_association_id, address_1.street AS address_1_street, address_1.city 
AS address_1_city, address_1.zip AS address_1_zip, customer_1.id AS 
customer_1_id, customer_1.name AS customer_1_name, customer_1.sales_person_id 
AS customer_1_sales_person_id, customer_1.address_association_id AS 
customer_1_address_association_id 
FROM salesperson LEFT OUTER JOIN customer AS customer_1 ON salesperson.id = 
customer_1.sales_person_id LEFT OUTER JOIN address_association AS 
address_association_1 ON address_association_1.id = 
customer_1.address_association_id LEFT OUTER JOIN address AS address_1 ON 
address_association_1.id = address_1.association_id


so in the FROM we have:   salesperson - customer - address_association - 
address

that’s correct.  you’ll note three joinedload() calls, three links (-).

then as the iteration proceeds, the sample calls upon address.parent.  This 
emits this query for two entries:

SELECT customer.id AS customer_id, customer.name AS customer_name, 
customer.sales_person_id AS customer_sales_person_id, 
customer.address_association_id AS customer_address_association_id 
FROM customer 
WHERE ? = customer.address_association_id

not sure if that’s the query you’re referring to.  That’s Address.parent, which 
is a proxy to CustomerAddressAssociation.parent, which is emitting a lazy load. 
  As this is the non-FK side of a one-to-one, that’s also correct.  A 
one-to-one is a special case of a one-to-many, basically, uselist=False means, 
fetch a one-to-many collection, but only deal with the first result.  
CustomerAddressAssociation.parent is the backref of the 
Customer.address_association many-to-one that’s stated in the joinedload().

if this were a non-generic mapping where Customer.addresses were a one-to-many 
and Address.parent were a many-to-one, you wouldn’t get that extra load for 
Address.parent; a pure many-to-one is implicitly retrieved from the database 
and cached that way when the one-to-many side is loaded.  but in this case the 
“generic” mapping has flipped this around so that Customer-AddressAssoiation 
is a many-to-one.







 Thank you so much for your help :)
 
 ~Victor
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+unsubscr...@googlegroups.com 
 mailto:sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com 
 mailto:sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy 
 http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/d/optout 
 https://groups.google.com/d/optout.

-- 
You received this message because you are