[sqlalchemy] Unifying large objects on commit/flush
Hi there! We are considering to use SQLAlchemy for a new project. Our first tests look promising and it is a fun to use SA. But I still have a problem to implement some special features we would like to have. For example, I want to store large (tens to hundreds of MB) objects into the database but keep the option open to store them into the filesystem later. I would prefer storing them into the DB to have them under transaction protection but OTOH I know that this can become a performance problem. So I want to build an interface to allow external (file or extra DB) storage later. So instead of the blob itself I want to store a cryptographic hash (like git, Mercurial, Fossil SCM etc. do) and index the real data from that. If somebody tries to import the same file twice, it should just reuse the existing blob (and possibly sanity check if the content matches). The following example is a greatly simplified example of that approach. It works like this, but I would like to do without the exception handler at the end ;-) -- import sqlalchemy as sa import sqlalchemy.orm as orm from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.ext.associationproxy import association_proxy import hashlib Base = declarative_base() class CommonStorage(Base): __tablename__ = common hashval = sa.Column(sa.String, primary_key=True) value = sa.Column(sa.LargeBinary) def __init__(self, v): self.value = v self.hashval = hashlib.md5(v).hexdigest() class StorageUser(Base): __tablename__ = user id= sa.Column(sa.Integer, primary_key=True) ref = sa.Column(None, sa.ForeignKey(CommonStorage.hashval)) rel = orm.relation(CommonStorage) value = association_proxy(rel, value) engine = sa.create_engine(sqlite:///, echo=True) Base.metadata.create_all(engine) session = orm.sessionmaker(bind=engine, autoflush=False)() ua, ub = StorageUser(), StorageUser() ua.value = ub.value = Something session.add(ua) session.commit() session.add(ub) try: session.commit() except sa.exc.FlushError: # I really like this error handling - we have to rollback first to # acknowledge the error. Cool! Never ignore errors and carry on again! session.rollback() ub.rel = session.query(CommonStorage).filter(ub.rel.hashval == CommonStorage.hashval).first() session.add(ub) session.commit() --- I tried using a MapperExtension on the StorageUser and to replace the ref inside before_insert and before_update by searching for a matching hash: - class StorageExtension(interfaces.MapperExtension): def before_insert(self, mapper, connection, instance): if instance.rel != None: sess = orm.object_session(instance) existing = sess.query(CommonStorage).filter(instance.rel.hashval == CommonStorage.hashval).first() if existing != None: instance.rel = existing before_update = before_insert class StorageUser(Base): __tablename__ = user +__mapper_args__ = dict(extension=StorageExtension()) - While the extension gets called and tries to replace the relation, SA still tries to insert the new entry. Any way to get this implemented? Greetings and thanks for any hint, Torsten -- DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH Torsten Landschoff Office Dresden Tel: +49-(0)351-4519587 Fax: +49-(0)351-4519561 mailto:torsten.landsch...@dynamore.de http://www.dynamore.de Registergericht: Mannheim, HRB: 109659, Sitz: Karlsruhe, Geschäftsführer: Prof. Dr. K. Schweizerhof, Dipl.-Math. U. Franz -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Unifying large objects on commit/flush
Torsten Landschoff wrote: While the extension gets called and tries to replace the relation, SA still tries to insert the new entry. Any way to get this implemented? here's the relevant bit of documentation: http://www.sqlalchemy.org/docs/reference/orm/interfaces.html?highlight=mapperextension#sqlalchemy.orm.interfaces.MapperExtension.before_insert Column-based attributes can be modified within this method which will result in the new value being inserted. However **no** changes to the overall flush plan can be made, and manipulation of the Session will not have the desired effect. To manipulate the Session within an extension, use SessionExtension. so here you need to use SessionExtension as you'd like to manipulate the flush plan. Greetings and thanks for any hint, Torsten -- DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH Torsten Landschoff Office Dresden Tel: +49-(0)351-4519587 Fax: +49-(0)351-4519561 mailto:torsten.landsch...@dynamore.de http://www.dynamore.de Registergericht: Mannheim, HRB: 109659, Sitz: Karlsruhe, Geschäftsführer: Prof. Dr. K. Schweizerhof, Dipl.-Math. U. Franz -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Unifying large objects on commit/flush
Hi Michael, On Fri, 2010-03-26 at 14:30 -0400, Michael Bayer wrote: here's the relevant bit of documentation: http://www.sqlalchemy.org/docs/reference/orm/interfaces.html?highlight=mapperextension#sqlalchemy.orm.interfaces.MapperExtension.before_insert Column-based attributes can be modified within this method which will result in the new value being inserted. However **no** changes to the overall flush plan can be made, and manipulation of the Session will not have the desired effect. To manipulate the Session within an extension, use SessionExtension. Thanks for the pointer. I read that part before but was not sure if I have to modify the flush plan. I attached the modified source code that actually works. I dislike this solution for the following reasons: * The extension scans through all new instances which could be quite a number. * The session must be modified (okay, no real problem). * In case multiple classes use the CommonStorage class, the StorageExtension must be adjusted. It would be better to operate on CommonStorage instances but I don't know how to find the related classes before the whole thing goes to the database. Another question: Any idea when the second SA book will be published? I bought the Essential SA book but it is a bit outdated covering 0.4.x. Thanks, Torsten -- DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH Torsten Landschoff Office Dresden mailto:torsten.landsch...@dynamore.de http://www.dynamore.de Registration court: Mannheim, HRB: 109659, based in Karlsruhe, Managing director: Prof. Dr. K. Schweizerhof, Dipl.-Math. U. Franz -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. import sqlalchemy as sa import sqlalchemy.orm as orm import sqlalchemy.orm.interfaces as interfaces from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.ext.associationproxy import association_proxy import hashlib Base = declarative_base() class CommonStorage(Base): __tablename__ = common hashval = sa.Column(sa.String, primary_key=True) value = sa.Column(sa.LargeBinary) def __init__(self, v): self.value = v self.hashval = hashlib.md5(v).hexdigest() def joinFrom(self, session): copy = session.query(CommonStorage).filter(self.hashval==CommonStorage.hashval).first() return copy or self class StorageUser(Base): __tablename__ = user id = sa.Column(sa.Integer, primary_key=True) ref = sa.Column(None, sa.ForeignKey(CommonStorage.hashval)) rel = orm.relation(CommonStorage) value = association_proxy(rel, value) class StorageExtension(interfaces.SessionExtension): def before_flush(self, session, flush_context, instances=None): for d in session.new: if isinstance(d, StorageUser) and d.rel is not None: original = d.rel d.rel = d.rel.joinFrom(session) if original is not d.rel: session.expunge(original) engine = sa.create_engine(sqlite:///, echo=True) Base.metadata.create_all(engine) session = orm.sessionmaker(bind=engine, extension=StorageExtension(), autoflush=False)() ua, ub = StorageUser(), StorageUser() ua.value = ub.value = Something session.add(ua) session.commit() session.add(ub) session.commit()