Re: [sqlalchemy] Question on session.expunge.all()

2011-09-07 Thread Vlad K.


Great, thanks!


.oO V Oo.


On 09/06/2011 04:48 PM, Michael Bayer wrote:

On Sep 6, 2011, at 10:40 AM, Vlad K. wrote:


I have a products database which is daily syncronized with an external source 
via a csv file. There are several thousand rows in question. The 
synchronization does two things:

1. Update only price if changed for existing products
2. Insert new products if they don't exist with all fields from csv

But basically, for each row in the csv, after the row is processed (one of the 
above two things is done), I don't need the object in session anymore. Memory 
and performance are of course an issue, and I can't find a way to test memory 
consumption with or without expunge_all() so my questions are:

1. Do I need to session.expunge_all() after each csv row is processed, or are 
they automatically garbage collected?
2. Is there any significant overhead inherent in expunge_all() that I'm not 
seeing right now?

Performance-wise, it seems the task is complete in more or less same time with 
or without expunge_all()

In modern SQLAlchemy, the Session maintains only weak references to objects that are 
clean, that is, are persistent in the database and have no pending changes to 
be flushed.As all references to them are lost, they are garbage collected by the 
Python interpreter.Note that objects are strongly referenced when they are present in 
the collection or attribute of a parent object, until that parent is also garbage 
collected.There is an overhead to process which occurs when the object is 
dereferenced and removed from the session (weakref callbacks handle the accounting).  But 
calling expunge_all() probably isn't doing much here as the objects are likely being 
cleaned out in the same way regardless.


While I'm at it, I also need to delete rows in the database that do not have 
corresponding row in the csv file (say linked by csv_key field), the first 
solution that comes to mind is building a list of keys in the csv file (few 
thousand keys) and then doing:

session.query(Product).filter(not_(Product.product_id.in_(csv_keys))).delete()

I believe there is less overhead in sending such a large (but single!) query to 
the database and leaving it to determine what to delete by itself, than 
selecting each row in the database and checking if its csv_key exists in the 
csv_keys list on the application side and then issuing delete statements for 
rows that matched the criteria. Am I wrong?

That's definitely a dramatically faster way to do things, rather than to load each record 
individually and mark as deleted - it's the primary reason delete() and update() are 
there.   You'll probably want to send False as the value of 
synchronize_session to the delete() call so that it doesn't go through the effort of 
locating local records that were affected (unless you need that feature).



--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Question on session.expunge.all()

2011-09-06 Thread Vlad K.


I have a products database which is daily syncronized with an external 
source via a csv file. There are several thousand rows in question. The 
synchronization does two things:


1. Update only price if changed for existing products
2. Insert new products if they don't exist with all fields from csv

But basically, for each row in the csv, after the row is processed (one 
of the above two things is done), I don't need the object in session 
anymore. Memory and performance are of course an issue, and I can't find 
a way to test memory consumption with or without expunge_all() so my 
questions are:


1. Do I need to session.expunge_all() after each csv row is processed, 
or are they automatically garbage collected?
2. Is there any significant overhead inherent in expunge_all() that I'm 
not seeing right now?


Performance-wise, it seems the task is complete in more or less same 
time with or without expunge_all()




While I'm at it, I also need to delete rows in the database that do not 
have corresponding row in the csv file (say linked by csv_key field), 
the first solution that comes to mind is building a list of keys in the 
csv file (few thousand keys) and then doing:


session.query(Product).filter(not_(Product.product_id.in_(csv_keys))).delete()

I believe there is less overhead in sending such a large (but single!) 
query to the database and leaving it to determine what to delete by 
itself, than selecting each row in the database and checking if its 
csv_key exists in the csv_keys list on the application side and then 
issuing delete statements for rows that matched the criteria. Am I wrong?


(I can't truncate table and reinsert products for other reasons)


The database is Postgresql.


Thanks!


--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Question on session.expunge.all()

2011-09-06 Thread Michael Bayer

On Sep 6, 2011, at 10:40 AM, Vlad K. wrote:

 
 I have a products database which is daily syncronized with an external source 
 via a csv file. There are several thousand rows in question. The 
 synchronization does two things:
 
 1. Update only price if changed for existing products
 2. Insert new products if they don't exist with all fields from csv
 
 But basically, for each row in the csv, after the row is processed (one of 
 the above two things is done), I don't need the object in session anymore. 
 Memory and performance are of course an issue, and I can't find a way to test 
 memory consumption with or without expunge_all() so my questions are:
 
 1. Do I need to session.expunge_all() after each csv row is processed, or are 
 they automatically garbage collected?
 2. Is there any significant overhead inherent in expunge_all() that I'm not 
 seeing right now?
 
 Performance-wise, it seems the task is complete in more or less same time 
 with or without expunge_all()

In modern SQLAlchemy, the Session maintains only weak references to objects 
that are clean, that is, are persistent in the database and have no pending 
changes to be flushed.As all references to them are lost, they are garbage 
collected by the Python interpreter.Note that objects are strongly 
referenced when they are present in the collection or attribute of a parent 
object, until that parent is also garbage collected.There is an overhead to 
process which occurs when the object is dereferenced and removed from the 
session (weakref callbacks handle the accounting).  But calling expunge_all() 
probably isn't doing much here as the objects are likely being cleaned out in 
the same way regardless.

 
 While I'm at it, I also need to delete rows in the database that do not have 
 corresponding row in the csv file (say linked by csv_key field), the first 
 solution that comes to mind is building a list of keys in the csv file (few 
 thousand keys) and then doing:
 
 session.query(Product).filter(not_(Product.product_id.in_(csv_keys))).delete()
 
 I believe there is less overhead in sending such a large (but single!) query 
 to the database and leaving it to determine what to delete by itself, than 
 selecting each row in the database and checking if its csv_key exists in the 
 csv_keys list on the application side and then issuing delete statements for 
 rows that matched the criteria. Am I wrong?

That's definitely a dramatically faster way to do things, rather than to load 
each record individually and mark as deleted - it's the primary reason delete() 
and update() are there.   You'll probably want to send False as the value of 
synchronize_session to the delete() call so that it doesn't go through the 
effort of locating local records that were affected (unless you need that 
feature).

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.