Re: [sqlalchemy] session.expire{,_all}_unmodified for dropping objects that are unchanged?!
Hello Mike, first, thanks for considering my ramblings and thanks for the reply! On 17.01.2018 23:23, Mike Bayer wrote: > this is overkill. Here's what your code looks like without an ORM: Yes, I think my message was lost in translation or rather in oversimplification. > this is very specific to the very quirky and odd thing I'm doing in > the first place, which is that I'm creating an endlessly long linked > list - I detach the element. The actual code does not create an endlessly long linked list of course, it's more like a directed acyclic graph. Nodes may have many outgoing edges so it's not so simple to know where to cut :-) >> This made me think about our usage of "session.expire": Mostly, we use it to >> tell SQLAlchemy that this is a good time to forget this object in the >> expectation that there are no pending changes. I found 13 calls to >> session.expire in our code base. > The way that loop is structured, I would not want to expire all > attributes that have been flushed. When you make a new Change(), it > has a predecessor for which it needs to locate its primary key in > order to assemble it into the foreign key column of the successor. If > you've expired that Change() object completely, you are forcing the > ORM to un-expire the object using another SELECT to ensure it hasn't > been DELETEd, so that it can use the Change.id value. The more True. And I admit that it did not think about cutting the links, I just wanted to drop objects (and all related links) that are no longer needed. I actually prefetch the primary keys of all objects touched or referred to by a batch of incoming objects. After processing the batch, I drop all objects not referred to by the next incoming batch and load those required for processing the next batch. > you can tell if an object is part of that "modified" collection using > instance_state(obj).modified - if that's false, it's definitely clean. > _modified isn't public. You can also look inside of session.dirty. Thanks for the pointer, I actually looked at the implementation of session.dirty to find the internal _modified. I replaced this with instance-state(obj).modified. > ".new" or ".deleted").But the most in-depth method, > session.is_modified(), does that state.modified check first before > digging in, so since you can likely tolerate false positives you can > stick with that. False positives are not an issue, if an instance is kept for one extra batch if will most likely be dropped in the next one. > it's "safe" but as before if the object is involved in other objects > still to be flushed, a new SELECT will need to be emitted to unexpire > the object. I droped safe_expire_all, it was only for completeness. About the extra select for unexpiring: I will gladly take that performance hit if I manage to reintroduce that bug. Calling expire() for an instance that had pending updates was (obviously) a bug in my application. I just wanted to go one step further than just fixing the spot where it appeared and make it harder to go there again. > the case you illustrate is surprising and unusual. I would mark it > up to a specific case you have here where you happen to be building up > an endlessly long chain of objects for which the issue of memory is > most cleanly handled by an explicit periodic break of the chain. The initial code was better here because session.commit() would have done that for us because of the implicit expire_all directly after the flush. As you noted above this lead to a number of extra selects to unexpire all objects still needed. BTW: Our application is a rich client where the user interface also keeps mapped objects loaded in a ORM session, since reloading is quite expensive. Updates are done mostly in worker threads and processes and broadcast to all database sessions. For updating the display, all modified objects are expired and refreshed from the database. This is where I actually messed up and refreshed from the DB inside a running transaction - before flushing the changes entered by the user. m( Finding the cause made me think of other misuses of expiration. >> Note that the documentation does not warn about dropping pending changes due >> to calls to expire. > it's mentioned here: > http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#session-expire > but just not in any kind of boldface: Oops, sorry, I only checked the reference documentation. > I actually probably wouldn't make expire() do what it does today, e.g. > expire un-flushed changes. When expire() was first created it was > likely not possible or very easy to hit only DB-persisted attributes. > The attribute history system was rewritten several times since then > and it probably would be pretty simple today to add a flag expire(obj, > persistent_only=True) something like that. But it's one of those > things where when someone needs that, I want to see why, because I > think there's usually another way to handle the situation.
Re: [sqlalchemy] session.expire{,_all}_unmodified for dropping objects that are unchanged?!
On Wed, Jan 17, 2018 at 11:06 AM, Torsten Landschoffwrote: > Hello world, > > tl;dr: I would like to have the ability to expire objects from the session > but only if they aren't new/dirty/deleted. > > Have a look at this example code: > > import random > > from sqlalchemy import Column, Integer, String, ForeignKey, create_engine > from sqlalchemy.orm import relationship, backref, sessionmaker > from sqlalchemy.ext.declarative import declarative_base > > Base = declarative_base() > > lorem_ipsum = """Lorem ipsum dolor sit amet, consectetur [...]""" * 42 > > class Change(Base): > __tablename__ = "change" > id = Column(Integer, primary_key=True) > predecessor_id = Column(Integer, ForeignKey("change.id")) > comment = Column(String) > > predecessor = relationship("Change", foreign_keys=[predecessor_id], > uselist=False) > > def __repr__(self): > return "<{0} #{1}: {2}>".format(self.__class__.__name__, self.id, > self.comment) > > engine = create_engine("sqlite:///demo.db", echo=False) > Base.metadata.create_all(engine) > > session = sessionmaker(engine)() > > current_change = Change(comment="Initial commit") > session.add(current_change) > > for batch in range(42): > for x in range(1000): > current_change = Change( > predecessor=current_change, > comment="Squashed bug #{0}\n\n{1}".format( > random.randrange(0, 100), lorem_ipsum) > ) > session.add(current_change) > session.flush() > > session.commit() > > Running this (with the complete lorem ipsum text) on my machine, I get this: > > $ ulimit -v 1024000 > $ python safe_expire.py > Traceback (most recent call last): > File "safe_expire.py", line 61, in > random.randrange(0, 100), lorem_ipsum) > MemoryError > > > Quite obviously the problem here is that all the mapped objects are kept in > memory even after being flushed (and before being committed). > This is just a contrived example to show the effect I had in our application > which (among other things) implements a DVCS on top of SQLAlchemy (on top of > a DB). > > Our real life use it not as simple, basically incoming objects are inserted > into the database bottom up (in SQL insertion order, because of foreign key > constraints). > For performance reasons the code processes incoming changes in batches and > tries to keep the objects relevant for relationships and sanity checking in > the database session - for us, session.expire_on_commit is set to False. > > During the batch operation the code would run out of memory, because nothing > is expired automatically. This was a surprise to me, because the session > only keeps a weak reference its mapped objects. It seems like the > relationships between objects keep them alive. As updates are processed > bottom up this basically means that no mapped objects are ever dropped. > > This is easy to fix by manually expiring old entries from the session, like > this: > > diff --git a/baseline.py b/baseline.py > index 44e1b7a..df4eb24 100644 > --- a/baseline.py > +++ b/baseline.py > @@ -50,7 +50,9 @@ current_change = Change(comment="Initial commit") > session.add(current_change) > > for batch in range(42): > +old_entries = [] > for x in range(1000): > +old_entries.append(current_change) > current_change = Change( > predecessor=current_change, > comment="Squashed bug #{0}\n\n{1}".format( > @@ -59,4 +61,7 @@ for batch in range(42): > session.add(current_change) > session.flush() > > +for entry in old_entries: > +session.expire(entry) > + > session.commit() > > > So this is what I did. this is overkill. Here's what your code looks like without an ORM: class Link(object): def __init__(self, previous=None): self.previous = previous current_link = Link() print("Current link! %s" % current_link) for batch in range(42): for x in range(1000): current_link = Link(current_link) print("Current link! %s" % current_link) This code has the exact same problem, without any expire() or anything like that.So if you were say writing these objects to a file or something like that, how would you deal with this loop building up an enormous chain of un-collectable Link objects? If it were me, I'd simply do this: for batch in range(42): for x in range(1000): current_link = Link(current_link) print("Current link! %s" % current_link) current_link.previous = None this is very specific to the very quirky and odd thing I'm doing in the first place, which is that I'm creating an endlessly long linked list - I detach the element. The analogue with the ORM would be to expire this specific link, which we can do each flush so that the entire previous chain goes away: for batch in range(42): for x in range(1000): current_change = Change( predecessor=current_change,
[sqlalchemy] session.expire{,_all}_unmodified for dropping objects that are unchanged?!
Hello world, tl;dr: I would like to have the ability to expire objects from the session but only if they aren't new/dirty/deleted. Have a look at this example code: import random from sqlalchemy import Column, Integer, String, ForeignKey, create_engine from sqlalchemy.orm import relationship, backref, sessionmaker from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() lorem_ipsum = """Lorem ipsum dolor sit amet, consectetur [...]""" * 42class Change(Base): __tablename__ = "change" id = Column(Integer, primary_key=True) predecessor_id = Column(Integer, ForeignKey("change.id")) comment = Column(String) predecessor = relationship("Change", foreign_keys=[predecessor_id], uselist=False) def __repr__(self): return "<{0} #{1}: {2}>".format(self.__class__.__name__, self.id, self.comment) engine = create_engine("sqlite:///demo.db", echo=False) Base.metadata.create_all(engine) session = sessionmaker(engine)() current_change = Change(comment="Initial commit") session.add(current_change) for batch in range(42): for x in range(1000): current_change = Change( predecessor=current_change, comment="Squashed bug #{0}\n\n{1}".format( random.randrange(0, 100), lorem_ipsum) ) session.add(current_change) session.flush() session.commit() Running this (with the complete lorem ipsum text) on my machine, I get this: > $ ulimit -v 1024000 > $ python safe_expire.py > Traceback (most recent call last): > File "safe_expire.py", line 61, in > random.randrange(0, 100), lorem_ipsum) > MemoryError Quite obviously the problem here is that all the mapped objects are kept in memory even after being flushed (and before being committed). This is just a contrived example to show the effect I had in our application which (among other things) implements a DVCS on top of SQLAlchemy (on top of a DB). Our real life use it not as simple, basically incoming objects are inserted into the database bottom up (in SQL insertion order, because of foreign key constraints). For performance reasons the code processes incoming changes in batches and tries to keep the objects relevant for relationships and sanity checking in the database session - for us, session.expire_on_commit is set to False. During the batch operation the code would run out of memory, because nothing is expired automatically. This was a surprise to me, because the session only keeps a weak reference its mapped objects. It seems like the relationships between objects keep them alive. As updates are processed bottom up this basically means that no mapped objects are ever dropped. This is easy to fix by manually expiring old entries from the session, like this: diff --git a/baseline.py b/baseline.py index 44e1b7a..df4eb24 100644 --- a/baseline.py +++ b/baseline.py @@ -50,7 +50,9 @@ current_change = Change(comment="Initial commit") session.add(current_change) for batch in range(42): + old_entries = [] for x in range(1000): + old_entries.append(current_change) current_change = Change( predecessor=current_change, comment="Squashed bug #{0}\n\n{1}".format( @@ -59,4 +61,7 @@ for batch in range(42): session.add(current_change) session.flush() + for entry in old_entries: + session.expire(entry) + session.commit() So this is what I did. This works fine, until somebody (maybe myself) did changes to the old objects without an intermediate flush. This was really hard to find, somehow updates never made it to the database. SQLAlchemy did not even execute the expected SQL. This made me think about our usage of "session.expire": Mostly, we use it to tell SQLAlchemy that this is a good time to forget this object in the expectation that there are no pending changes. I found 13 calls to session.expire in our code base. So I would like to replace it with something that gives SQLAlchemy a hint that it probably is a good time to drop these things. I came up with this code: def safe_expire(session, o): """ Drop *o* from *session* if it has no pending changes. """ if o in session.identity_map._modified: return False else: session.expire(o) return True def safe_expire_all(session): """ Drop all objects with no pending changes from *session*. """ id_map = session.identity_map unmodified = [o for key, o in id_map.iteritems() if not o in id_map._modified] for o in unmodified: session.expire(o) return unmodified Does this look like a safe thing to do? (I wonder about updated relationships etc.) Am I alone running into adverse effects of session.expire? Note that the documentation does not warn about dropping pending changes due to calls to expire. In hindsight this seems quite obvious, but I did not think about this when writing that code (because I knew the objects in question are clean anyway). Greetings and thanks for any insights,