[sqlalchemy] Windowed Queries breaking after a commit and emitting many, many Selects.
The application I'm working on operates over extremely large datasets, so I'm using the query windowing from here (https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery) to break it into manageable chunks. The query window is usually around 10k rows, after which it updates/deletes some rows and continues on. Simple breakdown is like this: query = session.query(Item).filter(...several filters) total_items = query.count() # used for logging for row in windowed_query(query, Item.id, 1): count += 1 # process, determine whether to keep (and update) or delete (put in a list for batch-deletion) # one such example is: if row.group_name != regex.group_name: continue if count = 1: save(items) # items to be kept, issues updates deleted = db.query(Item).filter(Item.id.in_(dead_items)).delete(synchronize_session='fetch') session.commit() count = 0 This works fine until it's gone through a save/delete cycle. Once it's saved, it goes back to access the windowed query again and pull the next 10k rows. This works until the following line: if row.group_name != regex.group_name: At which point sqla will emit a SELECT for the item of that specific ID, presumably because the group_name wasn't available and it had to fetch it. This only occurs after the commit - so I assume that committing the session is breaking the query. Hence, for the next 10k rows, it emits 10k queries (one per row). Because the script is potentially processing so many rows, I don't want to let the dead_items list grow to be massive, so the deletes need to occur fairly regularly throughout the process. Any idea what's causing this / how to fix it? Thanks! -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Windowed Queries breaking after a commit and emitting many, many Selects.
On Oct 26, 2014, at 5:25 AM, James Meneghello murod...@gmail.com wrote: The application I'm working on operates over extremely large datasets, so I'm using the query windowing from here (https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery) to break it into manageable chunks. The query window is usually around 10k rows, after which it updates/deletes some rows and continues on. Simple breakdown is like this: query = session.query(Item).filter(...several filters) total_items = query.count() # used for logging for row in windowed_query(query, Item.id, 1): count += 1 # process, determine whether to keep (and update) or delete (put in a list for batch-deletion) # one such example is: if row.group_name != regex.group_name: continue if count = 1: save(items) # items to be kept, issues updates deleted = db.query(Item).filter(Item.id.in_(dead_items)).delete(synchronize_session='fetch') session.commit() count = 0 This works fine until it's gone through a save/delete cycle. Once it's saved, it goes back to access the windowed query again and pull the next 10k rows. This works until the following line: if row.group_name != regex.group_name: At which point sqla will emit a SELECT for the item of that specific ID, presumably because the group_name wasn't available and it had to fetch it. This only occurs after the commit - so I assume that committing the session is breaking the query. Hence, for the next 10k rows, it emits 10k queries (one per row). commit() will expire everything in the Session by default, meaning all the attributes on objects gets cleared, and will emit a SELECT on next access. seems like you definitely want to disable this, which is the expire_on_commit setting of the Session itself; or otherwise, rearrange the logic here such that when you call session.commit(), you are definitely done with the objects in that batch of 1. Usually with batching, things are organized this way so that you only have the 10K objects at most at once in memory; the commit() closes out dealing with those 10K objects. In this case it seems like you are accessing the full range of 10K objects after that commit(). If it can’t be avoided then you’d have to create a Session with expire_on_commit=False, right now that setting is per-Session only. it may be possible to do it just for a range if you just set the attribute on the Session also but I haven’t tried that. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Eager Loading of AssociationProxy (Generic Association with Discriminator on Association)
On Oct 26, 2014, at 12:07 AM, Victor Reichert vfr...@gmail.com wrote: Hi, I am following the Generic Association with Discriminator on Association example at: http://docs.sqlalchemy.org/en/latest/_modules/examples/generic_associations/discriminator_on_association.html http://docs.sqlalchemy.org/en/latest/_modules/examples/generic_associations/discriminator_on_association.html However, I would like to eager load the customer.addresses in a query like eager_sales_persons = session.query(SalesPerson).options(joinedload(SalesPerson.customers).joinedload(Customer.addresses)) #with SalesPerson being a class I added with a relationship to customers. However, the above statement raises: 'AssociationProxy' object has no attribute 'property' I tried eager_sales_persons = session.query(SalesPerson).options(joinedload(SalesPerson.customers).joinedload(Customer.address_association).joinedload(AddressAssociation.addresses)).all() however, it would emit SQL for customer.addresses. I have made a pastie with my code at: http://pastie.org/9676017 http://pastie.org/9676017 Is there a loader strategy that would work for my situation or a work around for the AssociationProxy error? the second query is the right one, as currently there isn’t integration between an association proxy attribute and loader options, meaning, you have to state the joinedload() in terms of the actual relationship() as you are doing. However I’m not seeing the problem: if we load as : session.query(SalesPerson).options(joinedload(SalesPerson.customers).joinedload(Customer.address_association).joinedload(AddressAssociation.addresses)).all() which you can also state like this: session.query(SalesPerson).options(joinedload(SalesPerson.customers).joinedload(address_association).joinedload(“addresses)).all() setting echo=True on create_engine(), the main query is: SELECT salesperson.id AS salesperson_id, salesperson.name AS salesperson_name, address_association_1.id AS address_association_1_id, address_association_1.discriminator AS address_association_1_discriminator, address_1.id AS address_1_id, address_1.association_id AS address_1_association_id, address_1.street AS address_1_street, address_1.city AS address_1_city, address_1.zip AS address_1_zip, customer_1.id AS customer_1_id, customer_1.name AS customer_1_name, customer_1.sales_person_id AS customer_1_sales_person_id, customer_1.address_association_id AS customer_1_address_association_id FROM salesperson LEFT OUTER JOIN customer AS customer_1 ON salesperson.id = customer_1.sales_person_id LEFT OUTER JOIN address_association AS address_association_1 ON address_association_1.id = customer_1.address_association_id LEFT OUTER JOIN address AS address_1 ON address_association_1.id = address_1.association_id so in the FROM we have: salesperson - customer - address_association - address that’s correct. you’ll note three joinedload() calls, three links (-). then as the iteration proceeds, the sample calls upon address.parent. This emits this query for two entries: SELECT customer.id AS customer_id, customer.name AS customer_name, customer.sales_person_id AS customer_sales_person_id, customer.address_association_id AS customer_address_association_id FROM customer WHERE ? = customer.address_association_id not sure if that’s the query you’re referring to. That’s Address.parent, which is a proxy to CustomerAddressAssociation.parent, which is emitting a lazy load. As this is the non-FK side of a one-to-one, that’s also correct. A one-to-one is a special case of a one-to-many, basically, uselist=False means, fetch a one-to-many collection, but only deal with the first result. CustomerAddressAssociation.parent is the backref of the Customer.address_association many-to-one that’s stated in the joinedload(). if this were a non-generic mapping where Customer.addresses were a one-to-many and Address.parent were a many-to-one, you wouldn’t get that extra load for Address.parent; a pure many-to-one is implicitly retrieved from the database and cached that way when the one-to-many side is loaded. but in this case the “generic” mapping has flipped this around so that Customer-AddressAssoiation is a many-to-one. Thank you so much for your help :) ~Victor -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com mailto:sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com mailto:sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout. -- You received this message because you are