[sqlalchemy] Unifying large objects on commit/flush

2010-03-26 Thread Torsten Landschoff
Hi there!

We are considering to use SQLAlchemy for a new project. Our first tests
look promising and it is a fun to use SA. But I still have a problem to
implement some special features we would like to have.

For example, I want to store large (tens to hundreds of MB) objects into
the database but keep the option open to store them into the filesystem
later. I would prefer storing them into the DB to have them under
transaction protection but OTOH I know that this can become a
performance problem. So I want to build an interface to allow external
(file or extra DB) storage later.

So instead of the blob itself I want to store a cryptographic hash (like
git, Mercurial, Fossil SCM etc. do) and index the real data from that.
If somebody tries to import the same file twice, it should just reuse
the existing blob (and possibly sanity check if the content matches).

The following example is a greatly simplified example of that approach.
It works like this, but I would like to do without the exception handler
at the end ;-)

--
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy
import hashlib

Base = declarative_base()

class CommonStorage(Base):
__tablename__ = common
hashval = sa.Column(sa.String, primary_key=True)
value   = sa.Column(sa.LargeBinary)
def __init__(self, v):
self.value = v
self.hashval = hashlib.md5(v).hexdigest()

class StorageUser(Base):
__tablename__ = user
id= sa.Column(sa.Integer, primary_key=True)
ref   = sa.Column(None, sa.ForeignKey(CommonStorage.hashval))
rel   = orm.relation(CommonStorage)
value = association_proxy(rel, value)

engine = sa.create_engine(sqlite:///, echo=True)
Base.metadata.create_all(engine)
session = orm.sessionmaker(bind=engine, autoflush=False)()

ua, ub = StorageUser(), StorageUser()
ua.value = ub.value = Something
session.add(ua)
session.commit()
session.add(ub)
try:
session.commit()
except sa.exc.FlushError:
# I really like this error handling - we have to rollback first to
# acknowledge the error. Cool! Never ignore errors and carry on again!
session.rollback()
ub.rel = session.query(CommonStorage).filter(ub.rel.hashval == 
CommonStorage.hashval).first()
session.add(ub)
session.commit()
---

I tried using a MapperExtension on the StorageUser and to replace the
ref inside before_insert and before_update by searching for a matching
hash:

-
class StorageExtension(interfaces.MapperExtension):
def before_insert(self, mapper, connection, instance):
if instance.rel != None:
sess = orm.object_session(instance)
existing = sess.query(CommonStorage).filter(instance.rel.hashval == 
CommonStorage.hashval).first()
if existing != None:
instance.rel = existing
before_update = before_insert


 class StorageUser(Base):
 __tablename__ = user
+__mapper_args__ = dict(extension=StorageExtension())
-

While the extension gets called and tries to replace the relation, SA
still tries to insert the new entry.


Any way to get this implemented?

Greetings and thanks for any hint, Torsten

-- 
DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH
Torsten Landschoff

Office Dresden
Tel: +49-(0)351-4519587
Fax: +49-(0)351-4519561

mailto:torsten.landsch...@dynamore.de
http://www.dynamore.de

Registergericht: Mannheim, HRB: 109659, Sitz: Karlsruhe,
Geschäftsführer:  Prof. Dr. K. Schweizerhof, Dipl.-Math. U. Franz

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Unifying large objects on commit/flush

2010-03-26 Thread Michael Bayer
Torsten Landschoff wrote:
 While the extension gets called and tries to replace the relation, SA
 still tries to insert the new entry.


 Any way to get this implemented?

here's the relevant bit of documentation:


http://www.sqlalchemy.org/docs/reference/orm/interfaces.html?highlight=mapperextension#sqlalchemy.orm.interfaces.MapperExtension.before_insert

Column-based attributes can be modified within this method which will
result in the new value being inserted. However **no** changes to the
overall flush plan can be made, and manipulation of the Session  will not
have the desired effect. To manipulate the Session  within an extension,
use SessionExtension.


so here you need to use SessionExtension as you'd like to manipulate the
flush plan.





 Greetings and thanks for any hint, Torsten

 --
 DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH
 Torsten Landschoff

 Office Dresden
 Tel: +49-(0)351-4519587
 Fax: +49-(0)351-4519561

 mailto:torsten.landsch...@dynamore.de
 http://www.dynamore.de

 Registergericht: Mannheim, HRB: 109659, Sitz: Karlsruhe,
 Geschäftsführer:  Prof. Dr. K. Schweizerhof, Dipl.-Math. U. Franz

 --
 You received this message because you are subscribed to the Google Groups
 sqlalchemy group.
 To post to this group, send email to sqlalch...@googlegroups.com.
 To unsubscribe from this group, send email to
 sqlalchemy+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/sqlalchemy?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Unifying large objects on commit/flush

2010-03-26 Thread Torsten Landschoff
Hi Michael,

On Fri, 2010-03-26 at 14:30 -0400, Michael Bayer wrote:

 here's the relevant bit of documentation:
 
 http://www.sqlalchemy.org/docs/reference/orm/interfaces.html?highlight=mapperextension#sqlalchemy.orm.interfaces.MapperExtension.before_insert
 
 Column-based attributes can be modified within this method which will
 result in the new value being inserted. However **no** changes to the
 overall flush plan can be made, and manipulation of the Session  will not
 have the desired effect. To manipulate the Session  within an extension,
 use SessionExtension.

Thanks for the pointer. I read that part before but was not sure if I
have to modify the flush plan.

I attached the modified source code that actually works.

I dislike this solution for the following reasons:

* The extension scans through all new instances which could be quite a
number.
* The session must be modified (okay, no real problem).
* In case multiple classes use the CommonStorage class, the
StorageExtension must be adjusted. It would be better to operate on
CommonStorage instances but I don't know how to find the related classes
before the whole thing goes to the database.


Another question: Any idea when the second SA book will be published? I
bought the Essential SA book but it is a bit outdated covering 0.4.x.

Thanks, Torsten

-- 
DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH
Torsten Landschoff

Office Dresden

mailto:torsten.landsch...@dynamore.de
http://www.dynamore.de

Registration court: Mannheim, HRB: 109659, based in Karlsruhe,
Managing director:  Prof. Dr. K. Schweizerhof, Dipl.-Math. U. Franz

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

import sqlalchemy as sa
import sqlalchemy.orm as orm
import sqlalchemy.orm.interfaces as interfaces
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy
import hashlib

Base = declarative_base()

class CommonStorage(Base):
__tablename__ = common
hashval = sa.Column(sa.String, primary_key=True)
value = sa.Column(sa.LargeBinary)
def __init__(self, v):
self.value = v
self.hashval = hashlib.md5(v).hexdigest()
def joinFrom(self, session):
copy = session.query(CommonStorage).filter(self.hashval==CommonStorage.hashval).first()
return copy or self

class StorageUser(Base):
__tablename__ = user
id = sa.Column(sa.Integer, primary_key=True)
ref = sa.Column(None, sa.ForeignKey(CommonStorage.hashval))
rel = orm.relation(CommonStorage)
value = association_proxy(rel, value)

class StorageExtension(interfaces.SessionExtension):
def before_flush(self, session, flush_context, instances=None):
for d in session.new:
if isinstance(d, StorageUser) and d.rel is not None:
original = d.rel
d.rel = d.rel.joinFrom(session)
if original is not d.rel:
session.expunge(original)

engine = sa.create_engine(sqlite:///, echo=True)
Base.metadata.create_all(engine)
session = orm.sessionmaker(bind=engine, extension=StorageExtension(), autoflush=False)()

ua, ub = StorageUser(), StorageUser()
ua.value = ub.value = Something
session.add(ua)
session.commit()
session.add(ub)
session.commit()