Re: [sqlalchemy] A question regarding hybrid properties, relationships and schema changes
On Tue, Jun 24, 2014 at 3:35 PM, Mike Bayer mike...@zzzcomputing.com wrote: On 6/23/14, 8:09 PM, Ken Lareau wrote: if apptier: subq = ( Session.query( Package.pkg_name, Package.version, Package.revision, AppDefinition.app_type, AppDeployment.environment ).join(Deployment) .join(AppDeployment) .join(AppDefinition) .filter(Package.pkg_name == package_name) .filter(AppDeployment.environment == env) .filter(AppDeployment.status != 'invalidated')) [...] # The actual column name must be used in the subquery # usage below; DB itself should be corrected versions = (Session.query(subq.c.appType, subq.c.version, subq.c.revision) .group_by(subq.c.appType, subq.c.environment) .all()) The parts in the subquery work fine, but the access of the 'column' in the final query leads to this: Traceback (most recent call last): File stdin, line 1, in module File /home/klareau/repos/git/tagopsdb/tagopsdb/deploy/deploy.py, line 234, in find_deployed_version .group_by(subq.c.appType, subq.c.environment) File /home/klareau/.virtualenvs/tds/lib/python2.6/site-packages/sqlalchemy/util/_collections.py, line 174, in __getattr__ raise AttributeError(key) AttributeError: environment This is not completely surprising, but I'm uncertain as the best way to fix this... help? :) nothing is obviously wrong, assuming you are calling subquery() on subq at the end. It will be a select() construct which will have a column called .environment on the .c. collection because AppDeployment.environment is present. Can't say why this attribute is not here without full details. Oops there were some additional filters I left out, but this is added before the final query: subq = (subq.order_by(AppDeployment.realized.desc()) .subquery(name='t_ordered')) Not sure why it isn't working, but today my coworker an I massively rewrote one of the other methods to avoid this same issue, so maybe we should try the same for this one... though it would be nice to know what I goofed here, but not sure what additional information I can give that would help offhand... -- - Ken Lareau -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] postgresql large objects
Hi, I'm wondering how I might get at postgresql's large object type (lob). It's not to be confused with the TOASTED bytea, which are limited currently to 1 GiB yet in practice is much lower (for me 400MiB) - it's a special table + api designed to handle very large binary objects, like a few hundred MiBs to more recently TiBs. I don't see appropriate definitions anywhere and can't find any mention of it really with sqlalchemy. psycopg2 has support for it and calls it lobject, it provides a file like interface to the lob which is a good mapping since with libpgsql you use lo_creat, lo_seek, lo_write, lo_read to work with these beasts I took a look at UserDefinedType but on the bind_processor, this doesn't distinguish between inserts and updates. With inserts, you'd use an oid allocated from lo_creat in the transaction. On updates, you'd use lo_trunc/lo_write. As one more constraint, you must be in a transaction before any of these functions are usable. To reference large objects, as they are explicitly an out of table storage, the postgresql specific oid is used (which allows garbage collection, referential integrity checks etc). I'll also mention that other tables reference these large objects via oids, something like smart pointers in postgres. It'd be great to plug large objects into sqlalchemy properly - but can it be done? -Jason -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] Treating many-to-many relationships as many-to-one
Hello, I am trying to use SQLAlchemy to map a database that was made available to me, whose contents and schema cannot be altered. The database has a few cases where a many-to-one relationship is treated as a many-to-many - that is, instead of the child key being set directly onto the parent record, the relationship is mapped through an association table. So in the mapping it looks like this: vname_association = Table('VNameFactoid', Base.metadata, Column('vnameKey', Integer, ForeignKey('VariantName.vnameKey')), Column('factoidKey', Integer, ForeignKey('Factoid.factoidKey'))) class Factoid(Base): __tablename__ = 'Factoid' tstamp = Column(Time) engDesc = Column(Text) origLDesc = Column(Text) factoidKey = Column(Integer, primary_key=True) vnameInfo = relationship(VariantName, secondary=vname_association, backref=factoidData) class VariantName(Base): __tablename__ = 'VariantName' tstamp = Column(Time) vnameKey = Column(Integer, primary_key=True) name = Column(String) This means that I can get at the variant name information from the factoid by saying my_variant_name = some_factoid.vnameInfo[0].name But it would be much nicer if I could dispense with that [0], since I know that there will never be more than one record returned in the list. Is there something I can do to be able to say this? my_variant_name = some_factoid.vnameInfo.name Best, -tara -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Oracle with sequence for primary key and that sequence out of sync
On 6/25/14, 5:50 AM, Ofir Herzas wrote: Sorry to barge in, but I'm having the exact same issue and I'm pretty sure no one altered the sequence manually. I'm using sqlalchemy 0.8, python 2.7.6, rhel 6.5, oracle 10g, cx_oracle with the same connection string as above This issue started just recently after running ok for more than a thousand times. I should also say that my application is installed at several customers, and I have this issue only at the one using Oracle. I'm open to possibilities for how this could happen, short of the ALTER SEQUENCE command being emitted, in which case feel free to grep SQLAlchemy's codebase for this clause (it's not there). From my vantage point, if there is actually an issue that is implicitly making this happen outside of application code, it would have to be on the driver or server side somehow. On Tuesday, June 3, 2014 1:18:42 AM UTC+3, Michael Bayer wrote: On Jun 2, 2014, at 1:17 PM, Scott Koranda skor...@gmail.com javascript: wrote: I investigated and found that the sequence seq_admin_groups_id was now at the value 68 after having been used previously to insert rows with IDs in the 500s. I stopped the code and used sqlplus to change the sequence back to a value in the high 500s. I then restarted the code and the exception no longer occurred. I am unable to explain how the sequence seq_admin_groups_id went from in the 500s and working fine to suddenly being 68. The only place in the Python code where the sequence is used explicitly is in the definition of the AdminGroup() class. I would be grateful for any insights on how the sequence might have become out of sync or anything I can change in the code to prevent it from happening again. there’s nothing on the Python library side that could do that, someone had to have run an ALTER SEQUENCE on the database side in order for that to happen. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com mailto:sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com mailto:sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Treating many-to-many relationships as many-to-one
On 6/25/14, 7:11 AM, Tara Andrews wrote: Hello, I am trying to use SQLAlchemy to map a database that was made available to me, whose contents and schema cannot be altered. The database has a few cases where a many-to-one relationship is treated as a many-to-many - that is, instead of the child key being set directly onto the parent record, the relationship is mapped through an association table. So in the mapping it looks like this: vname_association = Table('VNameFactoid', Base.metadata, Column('vnameKey', Integer, ForeignKey('VariantName.vnameKey')), Column('factoidKey', Integer, ForeignKey('Factoid.factoidKey'))) class Factoid(Base): __tablename__ = 'Factoid' tstamp = Column(Time) engDesc = Column(Text) origLDesc = Column(Text) factoidKey = Column(Integer, primary_key=True) vnameInfo = relationship(VariantName, secondary=vname_association, backref=factoidData) class VariantName(Base): __tablename__ = 'VariantName' tstamp = Column(Time) vnameKey = Column(Integer, primary_key=True) name = Column(String) This means that I can get at the variant name information from the factoid by saying my_variant_name = some_factoid.vnameInfo[0].name But it would be much nicer if I could dispense with that [0], since I know that there will never be more than one record returned in the list. Is there something I can do to be able to say this? my_variant_name = some_factoid.vnameInfo.name yes you can apply the uselist=False option to the relationship.If the relationship loads more than one row you'll get a warning. Best, -tara -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com mailto:sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com mailto:sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] A question regarding hybrid properties, relationships and schema changes
On 6/25/14, 2:26 AM, Ken Lareau wrote: On Tue, Jun 24, 2014 at 3:35 PM, Mike Bayer mike...@zzzcomputing.com mailto:mike...@zzzcomputing.com wrote: On 6/23/14, 8:09 PM, Ken Lareau wrote: if apptier: subq = ( Session.query( Package.pkg_name, Package.version, Package.revision, AppDefinition.app_type, AppDeployment.environment ).join(Deployment) .join(AppDeployment) .join(AppDefinition) .filter(Package.pkg_name == package_name) .filter(AppDeployment.environment == env) .filter(AppDeployment.status != 'invalidated')) [...] # The actual column name must be used in the subquery # usage below; DB itself should be corrected versions = (Session.query(subq.c.appType, subq.c.version, subq.c.revision) .group_by(subq.c.appType, subq.c.environment) .all()) The parts in the subquery work fine, but the access of the 'column' in the final query leads to this: Traceback (most recent call last): File stdin, line 1, in module File /home/klareau/repos/git/tagopsdb/tagopsdb/deploy/deploy.py, line 234, in find_deployed_version .group_by(subq.c.appType, subq.c.environment) File /home/klareau/.virtualenvs/tds/lib/python2.6/site-packages/sqlalchemy/util/_collections.py, line 174, in __getattr__ raise AttributeError(key) AttributeError: environment This is not completely surprising, but I'm uncertain as the best way to fix this... help? :) nothing is obviously wrong, assuming you are calling subquery() on subq at the end. It will be a select() construct which will have a column called .environment on the .c. collection because AppDeployment.environment is present. Can't say why this attribute is not here without full details. Oops there were some additional filters I left out, but this is added before the final query: subq = (subq.order_by(AppDeployment.realized.desc()) .subquery(name='t_ordered')) Not sure why it isn't working, but today my coworker an I massively rewrote one of the other methods to avoid this same issue, so maybe we should try the same for this one... though it would be nice to know what I goofed here, but not sure what additional information I can give that would help offhand... what are the names that are actually present on subq.c ? print list(subq.c) should give an indication. -- - Ken Lareau -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com mailto:sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com mailto:sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Treating many-to-many relationships as many-to-one
On Wednesday, June 25, 2014 3:27:48 PM UTC+2, Michael Bayer wrote: yes you can apply the uselist=False option to the relationship.If the relationship loads more than one row you'll get a warning. That did it - thanks! Best, -tara -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] postgresql large objects
On 6/25/14, 2:35 AM, Jason Newton wrote: Hi, I'm wondering how I might get at postgresql's large object type (lob). It's not to be confused with the TOASTED bytea, which are limited currently to 1 GiB yet in practice is much lower (for me 400MiB) - it's a special table + api designed to handle very large binary objects, like a few hundred MiBs to more recently TiBs. I don't see appropriate definitions anywhere and can't find any mention of it really with sqlalchemy. psycopg2 has support for it and calls it lobject, it provides a file like interface to the lob which is a good mapping since with libpgsql you use lo_creat, lo_seek, lo_write, lo_read to work with these beasts I took a look at UserDefinedType but on the bind_processor, this doesn't distinguish between inserts and updates. With inserts, you'd use an oid allocated from lo_creat in the transaction. On updates, you'd use lo_trunc/lo_write. As one more constraint, you must be in a transaction before any of these functions are usable. To reference large objects, as they are explicitly an out of table storage, the postgresql specific oid is used (which allows garbage collection, referential integrity checks etc). I'll also mention that other tables reference these large objects via oids, something like smart pointers in postgres. It'd be great to plug large objects into sqlalchemy properly - but can it be done? well we just added the OID type in 0.9.5, so you at least have that. The PG LOB feature is very sparsely documented - on PG's docs, they only seem to document the C API (www.postgresql.org/docs/9.2/static/lo-interfaces.html), feel free to point me to better documentation on this. As far as transparently embedding this into INSERT/UPDATE, my first thought is that this might be unwise considering how far removed these functions are from any kind of mainstream usage in such statements - particularly if separate statements have to be called per value to get at OIDs or similar.That PG's docs barely mention this whole feature much less any kind of regular SQL integration is a red flag. PG's BYTEA type is already arbitrarily large so there is probably not much interest in a type like this.If it's the streaming feature you're looking for, SQLA's usual approach such as that of Oracle's LOB is to pre-stream it on the result set side (necessary, because fetching a batch of rows requires it), and cx_Oracle doesn't provide too much option to stream on the write side. I've dealt a lot with streaming datatypes back in the day but sometime in the early 2000's everyone just stopped using them. As the feature involves SQL functions I don't think you'd be dealing only with bind_processor(), the SQL functions themselves would probably be via SQL-level processing, see http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#applying-sql-level-bind-result-processing. It's true these functions aren't given access to the compiler context where you'd be able to discern INSERT from UPDATE, so I'd probably use two variants of datatype, an InsertLOB and UpdateLOB datatype, then I'd apply these to an INSERT or UPDATE statement at runtime probably using a @compiles decorator - http://docs.sqlalchemy.org/en/rel_0_9/core/compiler.html, that is, take the existing LOB type and just swap it out for InsertLOB/UpdateLOB. Adding compiler context to TypeEngine is something we could look into as a feature as well so that bind_expression() has this available somehow and switching around types wouldn't be needed. building this would involve stringing together hooks that are not quite set up for this, however for me to really assist here I'd need to see exact examples of what INSERT, UPDATE and anything else looks like in conjunction with these functions. I would note that psycopg2 itself also provides for extension types, including custom Connection and Cursor subclasses. If a lot of things have to happen when these types are in play it might be something that can occur at that level, PG's type API is obviously a lot more PG specific. -Jason -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com mailto:sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com mailto:sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit
RE: [sqlalchemy] Oracle with sequence for primary key and that sequence out of sync
Didn't say it was your responsibility, just thought you should know there might be an issue there. I'll modify the sequence manually and get back to it should the problem persist. From: sqlalchemy@googlegroups.com [mailto:sqlalchemy@googlegroups.com] On Behalf Of Mike Bayer Sent: Wednesday, June 25, 2014 4:27 PM To: sqlalchemy@googlegroups.com Subject: Re: [sqlalchemy] Oracle with sequence for primary key and that sequence out of sync On 6/25/14, 5:50 AM, Ofir Herzas wrote: Sorry to barge in, but I'm having the exact same issue and I'm pretty sure no one altered the sequence manually. I'm using sqlalchemy 0.8, python 2.7.6, rhel 6.5, oracle 10g, cx_oracle with the same connection string as above This issue started just recently after running ok for more than a thousand times. I should also say that my application is installed at several customers, and I have this issue only at the one using Oracle. I'm open to possibilities for how this could happen, short of the ALTER SEQUENCE command being emitted, in which case feel free to grep SQLAlchemy's codebase for this clause (it's not there). From my vantage point, if there is actually an issue that is implicitly making this happen outside of application code, it would have to be on the driver or server side somehow. On Tuesday, June 3, 2014 1:18:42 AM UTC+3, Michael Bayer wrote: On Jun 2, 2014, at 1:17 PM, Scott Koranda skor...@gmail.com javascript: wrote: I investigated and found that the sequence seq_admin_groups_id was now at the value 68 after having been used previously to insert rows with IDs in the 500s. I stopped the code and used sqlplus to change the sequence back to a value in the high 500s. I then restarted the code and the exception no longer occurred. I am unable to explain how the sequence seq_admin_groups_id went from in the 500s and working fine to suddenly being 68. The only place in the Python code where the sequence is used explicitly is in the definition of the AdminGroup() class. I would be grateful for any insights on how the sequence might have become out of sync or anything I can change in the code to prevent it from happening again. there’s nothing on the Python library side that could do that, someone had to have run an ALTER SEQUENCE on the database side in order for that to happen. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups sqlalchemy group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/4iAInBP6iyI/unsubscribe. To unsubscribe from this group and all its topics, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] inheriting from mapped classes
On Tuesday, June 24, 2014 9:40:02 PM UTC-4, Victor Olex wrote: What I aiming for is to provide users a library of base class(es), which are mapped using SQLAlchemy. The classes are then meant to be extended by users with business logic. I am not hell bent on using inheritance for this, but for now I went with your __abstract__ = True solution only in a somewhat inverted way. There's another thread from within the past 2 weeks from someone else trying to tackle this problem. I mentioned in that post, and I'll mention again here -- the best method I found was to use a form of a registry pattern -- where child classes inherit from base class and mention any overrides + the types of relationships they require or provide. As the classes are initialized, this data is recorded in a registry. after initialization, relationships are mapped onto the classes uses the data in the registry. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Oracle with sequence for primary key and that sequence out of sync
On Monday, June 2, 2014 6:18:42 PM UTC-4, Michael Bayer wrote: there’s nothing on the Python library side that could do that, someone had to have run an ALTER SEQUENCE on the database side in order for that to happen. I don't even think you could use ALTER SEQUENCE in this situation. I fondly remember Oracle sequences needing to be dropped and re-added with a lot of NEXTVALs... or using chunks of PL/SQL scripts to otherwise modify the sequence. Doing a quick look online, it seems like things still work the same way. If I were in this situation, I would enable a lot of query and connection logging, and then comb through the logs to see what was happening.For a variety of reasons, this doesn't seem like it should be possible. Is it possible that some records were being inserted with a numeric id that was originally generated by .nextval() in your app, but has since been incremented only in Python ? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] postgresql large objects
Hi, I've replied inline below. On Wed, Jun 25, 2014 at 6:46 AM, Mike Bayer mike...@zzzcomputing.com wrote: well we just added the OID type in 0.9.5, so you at least have that. I came across the entry on the issue tracker a little bit after submitting. As usual for me, it's support wasn't added very long ago. The PG LOB feature is very sparsely documented - on PG's docs, they only seem to document the C API ( www.postgresql.org/docs/9.2/static/lo-interfaces.html), feel free to point me to better documentation on this. There's also server side functions, they operate almost exactly the same as client side api: http://www.postgresql.org/docs/9.3/static/lo-funcs.html . There is no better documentation than those two that I know of, but they were sufficient for me. As far as transparently embedding this into INSERT/UPDATE, my first thought is that this might be unwise considering how far removed these functions are from any kind of mainstream usage in such statements - particularly if separate statements have to be called per value to get at OIDs or similar.That PG's docs barely mention this whole feature much less any kind of regular SQL integration is a red flag. PG's BYTEA type is already arbitrarily large so there is probably not much interest in a type like this.If it's the streaming feature you're looking for, SQLA's usual approach such as that of Oracle's LOB is to pre-stream it on the result set side (necessary, because fetching a batch of rows requires it), and cx_Oracle doesn't provide too much option to stream on the write side. I've dealt a lot with streaming datatypes back in the day but sometime in the early 2000's everyone just stopped using them. Seems to fly in the face at the point of SQLA although integration difficulties are appreciated. Most advanced postgresql drivers in any language bindings have added support for this type although none of them can hide that it's file like. PG's BYTEA is NOT arbitrarily large, it has quite real practical limits and I've hit them regularly in storing compressed HDF5 documents in the database as part of a bench testing framework. The theoretical limit is 1GB but this limit is far less in practice ( http://www.postgresql.org/message-id/cafj8pracfkoinp2uxeizod5krx29n2ofsoldh0w6ej7rxko...@mail.gmail.com ) . http://www.postgresql.org/docs/9.3/static/storage-toast.html I'm not looking for streaming, retrieve/set binary buffers. It'd be nice to translate it transparently to HDF5 python in-memory objects (ie h5py). As the feature involves SQL functions I don't think you'd be dealing only with bind_processor(), the SQL functions themselves would probably be via SQL-level processing, see http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#applying-sql-level-bind-result-processing. It's true these functions aren't given access to the compiler context where you'd be able to discern INSERT from UPDATE, so I'd probably use two variants of datatype, an InsertLOB and UpdateLOB datatype, then I'd apply these to an INSERT or UPDATE statement at runtime probably using a @compiles decorator - http://docs.sqlalchemy.org/en/rel_0_9/core/compiler.html, that is, take the existing LOB type and just swap it out for InsertLOB/UpdateLOB. Adding compiler context to TypeEngine is something we could look into as a feature as well so that bind_expression() has this available somehow and switching around types wouldn't be needed. Separate datatypes is clearly not a good approach to this. building this would involve stringing together hooks that are not quite set up for this, however for me to really assist here I'd need to see exact examples of what INSERT, UPDATE and anything else looks like in conjunction with these functions. Everything starts with a transaction block. Hand crafted inserts use server side lo_create(-1) (in sql) which allocates a new large object and returning clause to get the oid in one go back to the client side. Then I start using the lobject api on the returned oid. In patches I've made to cppdb, in the bind of std::istream I call client side lo_create, lo_open, a sequence of lo_writes, and finally lo_close. In retrospect this doesn't handle updates particularly well and maybe this could be used in sqlalchemy as is too - an update just allocates a new large object and unreferences the old one. This would leave it up to vacuumlo to GC those updated lobs. In my work load however, my lobs so far are immutable - they are results of simulations and this is the way I've worked to date. It probably won't stay that way forever. I would note that psycopg2 itself also provides for extension types, including custom Connection and Cursor subclasses. If a lot of things have to happen when these types are in play it might be something that can occur at that level, PG's type API is obviously a lot more PG specific. -Jason -- You received this message because you are subscribed
Re: [sqlalchemy] postgresql large objects
On 6/25/14, 2:53 PM, Jason Newton wrote: Seems to fly in the face at the point of SQLA although integration difficulties are appreciated. Most advanced postgresql drivers in any language bindings have added support for this type although none of them can hide that it's file like. SQLA doesn't go out of its way for unusual, very database-specific features that up til now have demonstrated little to no real-world use (nobody's ever asked for this feature before and googling about Postgresql LOBs turns up very little). There are tons of examples of features like this across many different database backends. If they are easy to add, we add them, or if they are easy to address via a recipe, we add the recipe up to the wiki. But if the behavior requires substantial changes to the core and dialect, and the ratio of complexity of impact to sparseness of need is really high, it's not worth it and actually kind of damaging to most users to complicate the library for use cases that are extremely rare and can just as well be addressed by dropping down to raw DBAPI code. Complications/destabiliziations/performance degradations that are hoisted onto the whole userbase for the benefit of a single feature that is virtually never needed is the wrong choice to make; I'm presented with this choice all the time and there's nearly always work to be done in extricating ill-conceived features and behaviors that went in too quickly. I'm pretty confident that this feature won't require any of that, but that remains to be seen. Just in case this wasn't apparent, you certainly *can* use psycopg2's bindings when you're in an otherwise SQLAlchemy app. Worst case you can retrieve a raw psycopg2 connection using connection.raw_connection and do whatever you need. If you truly have some use for LOBs, SQLAlchemy isn't preventing you from using it, it's just not providing any niceties around it. The fact that these unusual use cases are not ever prohibited by SQLA further raises the bar to adding first class support for them. PG's BYTEA is NOT arbitrarily large, it has quite real practical limits and I've hit them regularly in storing compressed HDF5 documents in the database as part of a bench testing framework. The theoretical limit is 1GB but this limit is far less in practice (http://www.postgresql.org/message-id/cafj8pracfkoinp2uxeizod5krx29n2ofsoldh0w6ej7rxko...@mail.gmail.com ) . http://www.postgresql.org/docs/9.3/static/storage-toast.html I'm not looking for streaming, retrieve/set binary buffers. It'd be nice to translate it transparently to HDF5 python in-memory objects (ie h5py). Python translation is very easy in SQLA, its just if you have special needs for SQL syntaxes, that's where special behaviors may be needed. So far it sounds like the only blocking factor is that bind_sql needs to distinguish between INSERT and UPDATE. that's not a terribly tall order though it is inconvenient in that the API would need a backwards-compatibility layer. As the feature involves SQL functions I don't think you'd be dealing only with bind_processor(), the SQL functions themselves would probably be via SQL-level processing, see http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#applying-sql-level-bind-result-processing. It's true these functions aren't given access to the compiler context where you'd be able to discern INSERT from UPDATE, so I'd probably use two variants of datatype, an InsertLOB and UpdateLOB datatype, then I'd apply these to an INSERT or UPDATE statement at runtime probably using a @compiles decorator - http://docs.sqlalchemy.org/en/rel_0_9/core/compiler.html, that is, take the existing LOB type and just swap it out for InsertLOB/UpdateLOB.Adding compiler context to TypeEngine is something we could look into as a feature as well so that bind_expression() has this available somehow and switching around types wouldn't be needed. Separate datatypes is clearly not a good approach to this. Let me clarify that these separate datatypes would be totally invisible to the user. The user would work with a single LOB type. Translation to Insert/Update versions would not be explicit and would occur at the point at which the insert/update construct is compiled. At the moment this may be the best approach short of modifying the library (but then again I can't say much about the approach because i have little idea what the SQL we're talking about looks like). Everything starts with a transaction block. the psycopg2 DBAPI is implicitly in a transaction block at all times unless autocommit mode is set up, so there shouldn't be any problem here. Hand crafted inserts use server side lo_create(-1) (in sql) which allocates a new large object and returning clause to get the oid in one go back to the client side. Then I start using the lobject api on the returned oid. I'd like to see explicit SQL,
Re: [sqlalchemy] postgresql large objects
Just in case this wasn't apparent, you certainly *can* use psycopg2's bindings when you're in an otherwise SQLAlchemy app. Worst case you can retrieve a raw psycopg2 connection using connection.raw_connection and do whatever you need. If you truly have some use for LOBs, SQLAlchemy isn't preventing you from using it, it's just not providing any niceties around it. The fact that these unusual use cases are not ever prohibited by SQLA further raises the bar to adding first class support for them. Yes, this much I knew would work, its just clunkier than it could be. PG's BYTEA is NOT arbitrarily large, it has quite real practical limits and I've hit them regularly in storing compressed HDF5 documents in the database as part of a bench testing framework. The theoretical limit is 1GB but this limit is far less in practice ( http://www.postgresql.org/message-id/cafj8pracfkoinp2uxeizod5krx29n2ofsoldh0w6ej7rxko...@mail.gmail.com ) . http://www.postgresql.org/docs/9.3/static/storage-toast.html I'm not looking for streaming, retrieve/set binary buffers. It'd be nice to translate it transparently to HDF5 python in-memory objects (ie h5py). Python translation is very easy in SQLA, its just if you have special needs for SQL syntaxes, that's where special behaviors may be needed. So far it sounds like the only blocking factor is that bind_sql needs to distinguish between INSERT and UPDATE. that's not a terribly tall order though it is inconvenient in that the API would need a backwards-compatibility layer. Perhaps you can keep backwards compatibility by introspecting the callbacks to determine the arity of their arguments. As the feature involves SQL functions I don't think you'd be dealing only with bind_processor(), the SQL functions themselves would probably be via SQL-level processing, see http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#applying-sql-level-bind-result-processing. It's true these functions aren't given access to the compiler context where you'd be able to discern INSERT from UPDATE, so I'd probably use two variants of datatype, an InsertLOB and UpdateLOB datatype, then I'd apply these to an INSERT or UPDATE statement at runtime probably using a @compiles decorator - http://docs.sqlalchemy.org/en/rel_0_9/core/compiler.html, that is, take the existing LOB type and just swap it out for InsertLOB/UpdateLOB. Adding compiler context to TypeEngine is something we could look into as a feature as well so that bind_expression() has this available somehow and switching around types wouldn't be needed. Separate datatypes is clearly not a good approach to this. Let me clarify that these separate datatypes would be totally invisible to the user. The user would work with a single LOB type. Translation to Insert/Update versions would not be explicit and would occur at the point at which the insert/update construct is compiled. At the moment this may be the best approach short of modifying the library (but then again I can't say much about the approach because i have little idea what the SQL we're talking about looks like). Ah. I thought it would be done in the declarative table descriptions. I'm not sure, being relatively new to SQLA, that would look like to do. Everything starts with a transaction block. the psycopg2 DBAPI is implicitly in a transaction block at all times unless autocommit mode is set up, so there shouldn't be any problem here. AFAIK it won't work with autocommit but I don't have documentation to prove that, I just remember getting errors in I think JDBC pertaining to it. Hand crafted inserts use server side lo_create(-1) (in sql) which allocates a new large object and returning clause to get the oid in one go back to the client side. Then I start using the lobject api on the returned oid. I'd like to see explicit SQL, preferably in the form of a psycopg2 script that illustrates all the operations you wish to support and specifically how they must interact with the database. I've attached example usages. As I indicated in my prior email, right now I only do inserts/selects. In patches I've made to cppdb, in the bind of std::istream I call client side lo_create, lo_open, a sequence of lo_writes, and finally lo_close. In retrospect this doesn't handle updates particularly well and maybe this could be used in sqlalchemy as is too - an update just allocates a new large object and unreferences the old one. I wonder why the mechanics of these functions can't be more cleanly wrapped into server-side stored procedures?If they can be distilled into simple insert_lob() update_lob() functions that would potentially be a better separation of duties. I think it's because only small lobs could rely on that - huge lobs that exceed working memory fit better with a file like api. It would be a nice convenience to have, perhaps. These same small lobs (ie lobs that fit in
Re: [sqlalchemy] postgresql large objects
On 6/25/14, 5:35 PM, Jason Newton wrote: Hand crafted inserts use server side lo_create(-1) (in sql) which allocates a new large object and returning clause to get the oid in one go back to the client side. Then I start using the lobject api on the returned oid. I'd like to see explicit SQL, preferably in the form of a psycopg2 script that illustrates all the operations you wish to support and specifically how they must interact with the database. I've attached example usages. As I indicated in my prior email, right now I only do inserts/selects. Here's my immediate thought about the INSERT (and the UPDATE) - the first way is using events, the second would attempt to move this system into something more native to the psycopg2 dialect: Use the before_cursor_execute() and after_cursor_execute() events to get at this: http://docs.sqlalchemy.org/en/latest/core/events.html#sqlalchemy.events.ConnectionEvents.before_cursor_execute http://docs.sqlalchemy.org/en/latest/core/events.html#sqlalchemy.events.ConnectionEvents.after_cursor_execute 1. Create a new type called PGLOB or whatever you want. For starters, this can just be a blank UserDefinedType subclass. in before_cursor_execute: 2. determine if INSERT or UPDATE using context.isinsert, context.isupdate 3. examine the datatypes that SQLAlchemy is using here, by looking at context.compiled.binds. The values here are BindParameter objects, you want to look in those for the ones that are of type PGLOB. 4. From context.compiled.binds, you have the names of the bound params with the type. Search and replace the statement for occurrences of that bound parameter, replace with lo_creat(-1) or whatever you need there. 5. Also append to the RETURNING clause those cols you need. 6. the statement as a return value will be used, if you set up the event with retval=True (see the docs). in after_cursor_execute: 7. in after_cursor_execute - call fetchone() to get the RETURNING values. Get that OID you care about then do that work with conn.lobject and all that. hopefully this doesnt mess up the existing cursor state. 8. now the tricky part. SQLAlchemy needs that row if you're doing implicit returning to get at primary key values. psycopg2's cursor seems to have a scroll() method that works for client side cursors. I'd scroll it back one so that SQLA gets the state it expects. Alternative system, more plugged in: 1. We would be creating new features on sqlalchemy/dialects/postgresql/psycopg2.py - PGExecutionContext_psycopg2. Similar hooks are available here which you can use to accomplish similar tasks; you'd want to look at the pre_exec(), post_exec() and possibly post_insert() methods, and maybe even _fetch_implicit_returning().If SQLA were to support this more natively, things would be happening at this level. But again, I really wouldn't want all kinds of hardcoded expectations of LOB objects in this object taking up processing time for the vast majority of use cases that don't use LOBs, so the extra logic here should be contained within something that can easily be placed aside based on options or similar. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Joins across differently-sharded tables
On 6/25/14, 6:36 PM, Neil Hooey wrote: Is it possible to do a join query on two tables that have different shard keys in SQLAlchemy? For example, if you have two tables: users: [userId, name] (sharded on userId) messages: [messageId, userId, message] (sharded on messageId) Could you select all of the messages for a particular user with: select u.name http://u.name, m.userId, m.messageId, m.message from messages m join users u on m.userId = u.userId (Even though messages are sharded on messageId instead of userId?) I haven't seen an example of this in the code. there's no way to do that because two shards are accessed via two totally different database connections. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] Aggregate functions across shards
Do aggregate functions work across shards in SQLAlchemy? Such as count, sum, etc? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Aggregate functions across shards
On 6/25/14, 6:41 PM, Neil Hooey wrote: Do aggregate functions work across shards in SQLAlchemy? Such as count, sum, etc? not across them, nope. you need to merge those yourself. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.