subject:"\[sqlalchemy\] Unifying large objects on commit\/flush"

[sqlalchemy] Unifying large objects on commit/flush

2010-03-26 Thread Torsten Landschoff

Hi there!

We are considering to use SQLAlchemy for a new project. Our first tests
look promising and it is a fun to use SA. But I still have a problem to
implement some special features we would like to have.

For example, I want to store large (tens to hundreds of MB) objects into
the database but keep the option open to store them into the filesystem
later. I would prefer storing them into the DB to have them under
transaction protection but OTOH I know that this can become a
performance problem. So I want to build an interface to allow external
(file or extra DB) storage later.

So instead of the blob itself I want to store a cryptographic hash (like
git, Mercurial, Fossil SCM etc. do) and index the real data from that.
If somebody tries to import the same file twice, it should just reuse
the existing blob (and possibly sanity check if the content matches).

The following example is a greatly simplified example of that approach.
It works like this, but I would like to do without the exception handler
at the end ;-)

--
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy
import hashlib

Base = declarative_base()

class CommonStorage(Base):
__tablename__ = common
hashval = sa.Column(sa.String, primary_key=True)
value   = sa.Column(sa.LargeBinary)
def __init__(self, v):
self.value = v
self.hashval = hashlib.md5(v).hexdigest()

class StorageUser(Base):
__tablename__ = user
id= sa.Column(sa.Integer, primary_key=True)
ref   = sa.Column(None, sa.ForeignKey(CommonStorage.hashval))
rel   = orm.relation(CommonStorage)
value = association_proxy(rel, value)

engine = sa.create_engine(sqlite:///, echo=True)
Base.metadata.create_all(engine)
session = orm.sessionmaker(bind=engine, autoflush=False)()

ua, ub = StorageUser(), StorageUser()
ua.value = ub.value = Something
session.add(ua)
session.commit()
session.add(ub)
try:
session.commit()
except sa.exc.FlushError:
# I really like this error handling - we have to rollback first to
# acknowledge the error. Cool! Never ignore errors and carry on again!
session.rollback()
ub.rel = session.query(CommonStorage).filter(ub.rel.hashval == 
CommonStorage.hashval).first()
session.add(ub)
session.commit()
---

I tried using a MapperExtension on the StorageUser and to replace the
ref inside before_insert and before_update by searching for a matching
hash:

-
class StorageExtension(interfaces.MapperExtension):
def before_insert(self, mapper, connection, instance):
if instance.rel != None:
sess = orm.object_session(instance)
existing = sess.query(CommonStorage).filter(instance.rel.hashval == 
CommonStorage.hashval).first()
if existing != None:
instance.rel = existing
before_update = before_insert


 class StorageUser(Base):
 __tablename__ = user
+__mapper_args__ = dict(extension=StorageExtension())
-

While the extension gets called and tries to replace the relation, SA
still tries to insert the new entry.


Any way to get this implemented?

Greetings and thanks for any hint, Torsten

-- 
DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH
Torsten Landschoff

Office Dresden
Tel: +49-(0)351-4519587
Fax: +49-(0)351-4519561

mailto:torsten.landsch...@dynamore.de
http://www.dynamore.de

Registergericht: Mannheim, HRB: 109659, Sitz: Karlsruhe,
Geschäftsführer:  Prof. Dr. K. Schweizerhof, Dipl.-Math. U. Franz

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Unifying large objects on commit/flush

2010-03-26 Thread Michael Bayer

Torsten Landschoff wrote:
While the extension gets called and tries to replace the relation, SA
still tries to insert the new entry.

Any way to get this implemented?

here's the relevant bit of documentation:

http://www.sqlalchemy.org/docs/reference/orm/interfaces.html?highlight=mapperextension#sqlalchemy.orm.interfaces.MapperExtension.before_insert

Column-based attributes can be modified within this method which will
result in the new value being inserted. However **no** changes to the
overall flush plan can be made, and manipulation of the Session will not
have the desired effect. To manipulate the Session within an extension,
use SessionExtension.

so here you need to use SessionExtension as you'd like to manipulate the
flush plan.

Greetings and thanks for any hint, Torsten

--
DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH
Torsten Landschoff

Office Dresden
Tel: +49-(0)351-4519587
Fax: +49-(0)351-4519561

mailto:torsten.landsch...@dynamore.de
http://www.dynamore.de

Registergericht: Mannheim, HRB: 109659, Sitz: Karlsruhe,
Geschäftsführer: Prof. Dr. K. Schweizerhof, Dipl.-Math. U. Franz

--
You received this message because you are subscribed to the Google Groups
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Unifying large objects on commit/flush

2010-03-26 Thread Torsten Landschoff

Hi Michael,

On Fri, 2010-03-26 at 14:30 -0400, Michael Bayer wrote:

 here's the relevant bit of documentation:
 
 http://www.sqlalchemy.org/docs/reference/orm/interfaces.html?highlight=mapperextension#sqlalchemy.orm.interfaces.MapperExtension.before_insert
 
 Column-based attributes can be modified within this method which will
 result in the new value being inserted. However **no** changes to the
 overall flush plan can be made, and manipulation of the Session  will not
 have the desired effect. To manipulate the Session  within an extension,
 use SessionExtension.

Thanks for the pointer. I read that part before but was not sure if I
have to modify the flush plan.

I attached the modified source code that actually works.

I dislike this solution for the following reasons:

* The extension scans through all new instances which could be quite a
number.
* The session must be modified (okay, no real problem).
* In case multiple classes use the CommonStorage class, the
StorageExtension must be adjusted. It would be better to operate on
CommonStorage instances but I don't know how to find the related classes
before the whole thing goes to the database.


Another question: Any idea when the second SA book will be published? I
bought the Essential SA book but it is a bit outdated covering 0.4.x.

Thanks, Torsten

-- 
DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH
Torsten Landschoff

Office Dresden

mailto:torsten.landsch...@dynamore.de
http://www.dynamore.de

Registration court: Mannheim, HRB: 109659, based in Karlsruhe,
Managing director:  Prof. Dr. K. Schweizerhof, Dipl.-Math. U. Franz

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

import sqlalchemy as sa
import sqlalchemy.orm as orm
import sqlalchemy.orm.interfaces as interfaces
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy
import hashlib

Base = declarative_base()

class CommonStorage(Base):
__tablename__ = common
hashval = sa.Column(sa.String, primary_key=True)
value = sa.Column(sa.LargeBinary)
def __init__(self, v):
self.value = v
self.hashval = hashlib.md5(v).hexdigest()
def joinFrom(self, session):
copy = session.query(CommonStorage).filter(self.hashval==CommonStorage.hashval).first()
return copy or self

class StorageUser(Base):
__tablename__ = user
id = sa.Column(sa.Integer, primary_key=True)
ref = sa.Column(None, sa.ForeignKey(CommonStorage.hashval))
rel = orm.relation(CommonStorage)
value = association_proxy(rel, value)

class StorageExtension(interfaces.SessionExtension):
def before_flush(self, session, flush_context, instances=None):
for d in session.new:
if isinstance(d, StorageUser) and d.rel is not None:
original = d.rel
d.rel = d.rel.joinFrom(session)
if original is not d.rel:
session.expunge(original)

engine = sa.create_engine(sqlite:///, echo=True)
Base.metadata.create_all(engine)
session = orm.sessionmaker(bind=engine, extension=StorageExtension(), autoflush=False)()

ua, ub = StorageUser(), StorageUser()
ua.value = ub.value = Something
session.add(ua)
session.commit()
session.add(ub)
session.commit()

[sqlalchemy] Unifying large objects on commit/flush

Re: [sqlalchemy] Unifying large objects on commit/flush

Re: [sqlalchemy] Unifying large objects on commit/flush

3 matches

Site Navigation

Mail list logo

Footer information