[sqlalchemy] Re: Question on session.expunge.all()

2011-09-08 Thread Victor Olex
Since you are effectively overwriting the table with new file
contents, the fastest may well be to truncate the table then insert
all contents. If you were to just append and update then
session.merge() is convenient way to do this though I am not sure if
the fastest.

On Sep 7, 5:53 pm, Vlad K. v...@haronmedia.com wrote:
 Great, thanks!

 .oO V Oo.

 On 09/06/2011 04:48 PM, Michael Bayer wrote:

  On Sep 6, 2011, at 10:40 AM, Vlad K. wrote:

  I have a products database which is daily syncronized with an external 
  source via a csv file. There are several thousand rows in question. The 
  synchronization does two things:

  1. Update only price if changed for existing products
  2. Insert new products if they don't exist with all fields from csv

  But basically, for each row in the csv, after the row is processed (one of 
  the above two things is done), I don't need the object in session anymore. 
  Memory and performance are of course an issue, and I can't find a way to 
  test memory consumption with or without expunge_all() so my questions are:

  1. Do I need to session.expunge_all() after each csv row is processed, or 
  are they automatically garbage collected?
  2. Is there any significant overhead inherent in expunge_all() that I'm 
  not seeing right now?

  Performance-wise, it seems the task is complete in more or less same time 
  with or without expunge_all()
  In modern SQLAlchemy, the Session maintains only weak references to objects 
  that are clean, that is, are persistent in the database and have no 
  pending changes to be flushed.    As all references to them are lost, they 
  are garbage collected by the Python interpreter.    Note that objects are 
  strongly referenced when they are present in the collection or attribute of 
  a parent object, until that parent is also garbage collected.    There is 
  an overhead to process which occurs when the object is dereferenced and 
  removed from the session (weakref callbacks handle the accounting).  But 
  calling expunge_all() probably isn't doing much here as the objects are 
  likely being cleaned out in the same way regardless.

  While I'm at it, I also need to delete rows in the database that do not 
  have corresponding row in the csv file (say linked by csv_key field), the 
  first solution that comes to mind is building a list of keys in the csv 
  file (few thousand keys) and then doing:

  session.query(Product).filter(not_(Product.product_id.in_(csv_keys))).delete()

  I believe there is less overhead in sending such a large (but single!) 
  query to the database and leaving it to determine what to delete by 
  itself, than selecting each row in the database and checking if its 
  csv_key exists in the csv_keys list on the application side and then 
  issuing delete statements for rows that matched the criteria. Am I wrong?
  That's definitely a dramatically faster way to do things, rather than to 
  load each record individually and mark as deleted - it's the primary reason 
  delete() and update() are there.   You'll probably want to send False as 
  the value of synchronize_session to the delete() call so that it doesn't go 
  through the effort of locating local records that were affected (unless you 
  need that feature).

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Question on session.expunge.all()

2011-09-08 Thread Vlad K.


No, I can't truncate the table for other reasons, as I mentioned in my 
original question. :)


The issue here was not how to sync the data, but whether processed rows 
stay in session even though the objects (model instances) are discarded 
at the end of each iteration (each csv row), or in other words whether I 
have to expunge_all() or not.


It seems I don't have to (SQLAlchemy 0.7, PostgreSQL backend).


Thanks!

.oO V Oo.


On 09/08/2011 06:47 PM, Victor Olex wrote:

Since you are effectively overwriting the table with new file
contents, the fastest may well be to truncate the table then insert
all contents. If you were to just append and update then
session.merge() is convenient way to do this though I am not sure if
the fastest.


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.