Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Achim

Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:

 I wrote a full post regarding this topic on stackoverflow at  
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768.
   If you start with this, I can answer more specific questions. 


The article was very helpful, thanks. I still want to figure out the best 
balance between convenience and speed for my use case. Do the following 
make sense and is possible?

I work only with Postgresql and I'm sure that all involved objects have a 
unique id column which is called 'id'.  So before doing a session.commit(), 
I could check how many objects are in my session. As I'm just bulk 
inserting, I know that all of them are new and don't have their id set yet. 
Now I ask the database for that number of new ids, iterate over the objects 
in my session and set the ids. Internally all ids would come from a single 
sequence, so I don't have to care about object types and so on. Afterwards 
SqlAlchemy should be aware that ids have already been set, so no generated 
ids have to be returned and the session.commit() should be much simpler and 
faster.

Sounds like a still quite simple, but hopefully much faster solution. Do 
you agree?

kind regards,
Achim

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


[sqlalchemy] Is it possible to obtain actual values inserted into database in a SessionExtension?

2013-11-13 Thread Mark Bird
I am trying to write a generic auditing extension, but I have some 
inconsistencies I'd like to iron out if possible.

The problem I have is that when an object is updated via a relationship 
rather than a Column attribute, the value returned from 
attributes.get_history is a SQLAlchemy object, and not the value that is in 
the database. This is more pronounced on a many-to-many relationship - the 
secondary table isn't marked as new/dirty, even though that is the table 
that will actually have data inserted into it. Instead the 2 tables joined 
by the secondary table are marked as dirty, although no data is changed in 
those tables. I understand this makes sense at the SQLAlchemy level - but 
is there a way to get the actual database changes? 

A code example to illustrate this:

from sqlalchemy.types import Integer
from sqlalchemy import Column, ForeignKey, create_engine
from sqlalchemy.orm import relationship, backref, attributes, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.interfaces import SessionExtension

BASE = declarative_base()

ENGINE = create_engine(sqlite://)

class AuditListener(SessionExtension):
def after_flush(self, session, *args):
for attr in (new, dirty, deleted):
print attr
for obj in getattr(session, attr):
print %s % obj.__class__.__name__
for col in obj.__mapper__.iterate_properties:
added, unchanged, deleted = attributes.get_history(obj, 
col.key)
if added:
print %s: %s % (col.key, str(added))

SESSION = sessionmaker(
bind=ENGINE,
extension=(AuditListener())
)

class FooBar(BASE):
__tablename__ = 'foobar'
foo_id = Column('foo_id',
ForeignKey('foo.foo_id'),
primary_key=True)

bar_id = Column('bar_id',
ForeignKey('bar.bar_id'),
primary_key=True)

class Foo(BASE):
__tablename__ = 'foo'
foo_id = Column('foo_id',
Integer,
primary_key=True)

bars = relationship('Bar',
secondary=FooBar.__table__,
backref=backref('foos'))

class Bar(BASE):
__tablename__ = 'bar'
bar_id = Column('bar_id',
Integer,
primary_key=True)


def example():
BASE.metadata.create_all(ENGINE, checkfirst=False)

session = SESSION()
f = Foo(foo_id=1)
b1 = Bar(bar_id=1)
b2 = Bar(bar_id=2)
session.add(f)
session.add(b1)
session.add(b2)
session.commit()

# add relationship between f and b1  b2:
f.bars = [b1, b2]
session.commit()

if __name__ == '__main__':
example()

If you run this code you will see this output:

new
Bar
bar_id: [2]
Foo
foo_id: [1]
Bar
bar_id: [1]
dirty
deleted
new
dirty
Bar
Foo
bars: [__main__.Bar object at 0x27b4910, __main__.Bar object at 
0x277b050]
Bar
deleted

As you can see, the output when the many-to-many relationship is committed 
is to denote an update to the bars relationship of Foo. There is nothing 
indicating there are 2 new FooBar entries created.

Any ideas?

Thanks,

Mark.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Is it possible to obtain actual values inserted into database in a SessionExtension?

2013-11-13 Thread Michael Bayer

On Nov 13, 2013, at 7:00 AM, Mark Bird mark.a.b...@gmail.com wrote:

 I am trying to write a generic auditing extension, but I have some 
 inconsistencies I'd like to iron out if possible.
 
 The problem I have is that when an object is updated via a relationship 
 rather than a Column attribute, the value returned from 
 attributes.get_history is a SQLAlchemy object, and not the value that is in 
 the database. This is more pronounced on a many-to-many relationship - the 
 secondary table isn't marked as new/dirty, even though that is the table that 
 will actually have data inserted into it. Instead the 2 tables joined by the 
 secondary table are marked as dirty, although no data is changed in those 
 tables. I understand this makes sense at the SQLAlchemy level - but is there 
 a way to get the actual database changes? 


yeah I’ve tried to stress in the documentation, using “secondary” in 
relationship() in conjunction with an ORM object that is actually mapped to 
that table will lead to confusion.  There is no history present for the table 
that you specify for “secondary”, it is only subject to inserts/deletes based 
on changes to the collection in which it refers to, e.g. Foo.bars, and these 
inserts/deletes are calculated and executed within a closed flush action.  The 
FooBar class does not play a role of any kind in this operation since you have 
not instantiated a FooBar object.

You need to make a choice if you want to work with Foo-FooBar-Bar objects, or 
if you want to work with Foo-Bar directly with “secondary” in between, and 
then stick to just one or the other.



 
 A code example to illustrate this:
 
 from sqlalchemy.types import Integer
 from sqlalchemy import Column, ForeignKey, create_engine
 from sqlalchemy.orm import relationship, backref, attributes, sessionmaker
 from sqlalchemy.ext.declarative import declarative_base
 from sqlalchemy.orm.interfaces import SessionExtension
 
 BASE = declarative_base()
 
 ENGINE = create_engine(sqlite://)
 
 class AuditListener(SessionExtension):
 def after_flush(self, session, *args):
 for attr in (new, dirty, deleted):
 print attr
 for obj in getattr(session, attr):
 print %s % obj.__class__.__name__
 for col in obj.__mapper__.iterate_properties:
 added, unchanged, deleted = attributes.get_history(obj, 
 col.key)
 if added:
 print %s: %s % (col.key, str(added))
 
 SESSION = sessionmaker(
 bind=ENGINE,
 extension=(AuditListener())
 )
 
 class FooBar(BASE):
 __tablename__ = 'foobar'
 foo_id = Column('foo_id',
 ForeignKey('foo.foo_id'),
 primary_key=True)
 
 bar_id = Column('bar_id',
 ForeignKey('bar.bar_id'),
 primary_key=True)
 
 class Foo(BASE):
 __tablename__ = 'foo'
 foo_id = Column('foo_id',
 Integer,
 primary_key=True)
 
 bars = relationship('Bar',
 secondary=FooBar.__table__,
 backref=backref('foos'))
 
 class Bar(BASE):
 __tablename__ = 'bar'
 bar_id = Column('bar_id',
 Integer,
 primary_key=True)
 
 
 def example():
 BASE.metadata.create_all(ENGINE, checkfirst=False)
 
 session = SESSION()
 f = Foo(foo_id=1)
 b1 = Bar(bar_id=1)
 b2 = Bar(bar_id=2)
 session.add(f)
 session.add(b1)
 session.add(b2)
 session.commit()
 
 # add relationship between f and b1  b2:
 f.bars = [b1, b2]
 session.commit()
 
 if __name__ == '__main__':
 example()
 
 If you run this code you will see this output:
 
 new
 Bar
 bar_id: [2]
 Foo
 foo_id: [1]
 Bar
 bar_id: [1]
 dirty
 deleted
 new
 dirty
 Bar
 Foo
 bars: [__main__.Bar object at 0x27b4910, __main__.Bar object at 
 0x277b050]
 Bar
 deleted
 
 As you can see, the output when the many-to-many relationship is committed is 
 to denote an update to the bars relationship of Foo. There is nothing 
 indicating there are 2 new FooBar entries created.
 
 Any ideas?
 
 Thanks,
 
 Mark.
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/groups/opt_out.



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Michael Bayer

On Nov 13, 2013, at 4:57 AM, Achim do...@procoders.net wrote:

 
 Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:
 I wrote a full post regarding this topic on stackoverflow at  
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
  .  If you start with this, I can answer more specific questions. 
 
 The article was very helpful, thanks. I still want to figure out the best 
 balance between convenience and speed for my use case. Do the following make 
 sense and is possible?
 
 I work only with Postgresql and I'm sure that all involved objects have a 
 unique id column which is called 'id'.  So before doing a session.commit(), I 
 could check how many objects are in my session. As I'm just bulk inserting, I 
 know that all of them are new and don't have their id set yet. Now I ask the 
 database for that number of new ids, iterate over the objects in my session 
 and set the ids. Internally all ids would come from a single sequence, so I 
 don't have to care about object types and so on. Afterwards SqlAlchemy should 
 be aware that ids have already been set, so no generated ids have to be 
 returned and the session.commit() should be much simpler and faster.
 
 Sounds like a still quite simple, but hopefully much faster solution. Do you 
 agree?


sure that should be fine, if you can pre-calc your PKs.   It just won’t work 
under any kind of concurrency, as in such a situation there could be 
interleaved INSERTs from different processes.




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Michael Bayer

On Nov 13, 2013, at 11:52 AM, Claudio Freire klaussfre...@gmail.com wrote:

 On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 
 Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:
 
 I wrote a full post regarding this topic on stackoverflow at
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
 .  If you start with this, I can answer more specific questions.
 
 
 The article was very helpful, thanks. I still want to figure out the best
 balance between convenience and speed for my use case. Do the following make
 sense and is possible?
 
 I work only with Postgresql and I'm sure that all involved objects have a
 unique id column which is called 'id'.  So before doing a session.commit(),
 I could check how many objects are in my session. As I'm just bulk
 inserting, I know that all of them are new and don't have their id set yet.
 Now I ask the database for that number of new ids, iterate over the objects
 in my session and set the ids. Internally all ids would come from a single
 sequence, so I don't have to care about object types and so on. Afterwards
 SqlAlchemy should be aware that ids have already been set, so no generated
 ids have to be returned and the session.commit() should be much simpler and
 faster.
 
 Sounds like a still quite simple, but hopefully much faster solution. Do you
 agree?
 
 
 
 sure that should be fine, if you can pre-calc your PKs.   It just won’t work
 under any kind of concurrency, as in such a situation there could be
 interleaved INSERTs from different processes.
 
 
 Postgresql sequences already handle that kind of concurrency scenario.


how exactly, if two transactions T1 and T2 both pull a number from a sequence, 
T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not yet occurred 
for each, T1 has 40-50, T2 has 41-51, the number is totally wrong for both - in 
reality it would be some random distribution of 40-60 between T1 and T2.   No ?




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Achim
According to 
http://www.postgresql.org/docs/9.2/static/functions-sequence.html, 
sequences are non-transactional:

Advance the sequence object to its next value and return that value. This 
is done atomically: even if multiple sessions execute nextval concurrently, 
each will safely receive a distinct sequence value.
...
To avoid blocking concurrent transactions that obtain numbers from the 
same sequence, a nextval operation is never rolled back; that is, once a 
value has been fetched it is considered used, even if the transaction that 
did the nextval later aborts. This means that aborted transactions might 
leave unused holes in the sequence of assigned values.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Claudio Freire
On Wed, Nov 13, 2013 at 2:04 PM, Michael Bayer mike...@zzzcomputing.com wrote:

 On Nov 13, 2013, at 11:52 AM, Claudio Freire klaussfre...@gmail.com wrote:

 On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:

 Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:

 I wrote a full post regarding this topic on stackoverflow at
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
 .  If you start with this, I can answer more specific questions.


 The article was very helpful, thanks. I still want to figure out the best
 balance between convenience and speed for my use case. Do the following make
 sense and is possible?

 I work only with Postgresql and I'm sure that all involved objects have a
 unique id column which is called 'id'.  So before doing a session.commit(),
 I could check how many objects are in my session. As I'm just bulk
 inserting, I know that all of them are new and don't have their id set yet.
 Now I ask the database for that number of new ids, iterate over the objects
 in my session and set the ids. Internally all ids would come from a single
 sequence, so I don't have to care about object types and so on. Afterwards
 SqlAlchemy should be aware that ids have already been set, so no generated
 ids have to be returned and the session.commit() should be much simpler and
 faster.

 Sounds like a still quite simple, but hopefully much faster solution. Do you
 agree?



 sure that should be fine, if you can pre-calc your PKs.   It just won’t work
 under any kind of concurrency, as in such a situation there could be
 interleaved INSERTs from different processes.


 Postgresql sequences already handle that kind of concurrency scenario.


 how exactly, if two transactions T1 and T2 both pull a number from a 
 sequence, T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not 
 yet occurred for each, T1 has 40-50, T2 has 41-51, the number is totally 
 wrong for both - in reality it would be some random distribution of 40-60 
 between T1 and T2.   No ?


No, you ask for 10 ids to the same sequence, and the sequence
allocates T1 40-49, and T2 50-59

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Speed up bulk inserts

2013-11-13 Thread Michael Bayer

On Nov 13, 2013, at 12:22 PM, Claudio Freire klaussfre...@gmail.com wrote:

 On Wed, Nov 13, 2013 at 2:04 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 
 On Nov 13, 2013, at 11:52 AM, Claudio Freire klaussfre...@gmail.com wrote:
 
 On Wed, Nov 13, 2013 at 1:45 PM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 
 Am Mittwoch, 6. November 2013 21:58:53 UTC+1 schrieb Michael Bayer:
 
 I wrote a full post regarding this topic on stackoverflow at
 http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly/11769768#11769768
 .  If you start with this, I can answer more specific questions.
 
 
 The article was very helpful, thanks. I still want to figure out the best
 balance between convenience and speed for my use case. Do the following 
 make
 sense and is possible?
 
 I work only with Postgresql and I'm sure that all involved objects have a
 unique id column which is called 'id'.  So before doing a session.commit(),
 I could check how many objects are in my session. As I'm just bulk
 inserting, I know that all of them are new and don't have their id set yet.
 Now I ask the database for that number of new ids, iterate over the objects
 in my session and set the ids. Internally all ids would come from a single
 sequence, so I don't have to care about object types and so on. Afterwards
 SqlAlchemy should be aware that ids have already been set, so no generated
 ids have to be returned and the session.commit() should be much simpler and
 faster.
 
 Sounds like a still quite simple, but hopefully much faster solution. Do 
 you
 agree?
 
 
 
 sure that should be fine, if you can pre-calc your PKs.   It just won’t 
 work
 under any kind of concurrency, as in such a situation there could be
 interleaved INSERTs from different processes.
 
 
 Postgresql sequences already handle that kind of concurrency scenario.
 
 
 how exactly, if two transactions T1 and T2 both pull a number from a 
 sequence, T1 gets 40, T2 gets 41, then we pre-calc 10 inserts that have not 
 yet occurred for each, T1 has 40-50, T2 has 41-51, the number is totally 
 wrong for both - in reality it would be some random distribution of 40-60 
 between T1 and T2.   No ?
 
 
 No, you ask for 10 ids to the same sequence, and the sequence
 allocates T1 40-49, and T2 50-59

oh right I’m thinking of the trick as applied to MySQL.Sure, if you 
actually run the seq’s you’re fine.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [sqlalchemy] A noop query?

2013-11-13 Thread Michael Bayer

On Nov 12, 2013, at 6:45 AM, warvariuc victor.varvar...@gmail.com wrote:

 I have a model with a relationship:
 
 class Customer(Base):
 
 __tablename__ = 'customer'
 ...
 customer_wishes = relationship('CustomerWish', back_populates='customer', 
 lazy='subquery')
 
 When I query a customer from the db:
 
 15:32:00INFO [sqlalchemy.colorsql]:  0.005s  SELECT * FROM customer WHERE 
 customer.id = 1599622
 15:32:00INFO [sqlalchemy.colorsql]:  0.004s  SELECT * FROM (SELECT 
 customer.id AS customer_id FROM customer WHERE customer.id = 1599622) AS 
 anon_1
 JOIN customer_wish ON anon_1.customer_id = customer_wish.customer_id
 ORDER BY anon_1.customer_id
 
 This SELECT customer.id AS customer_id FROM customer WHERE customer.id = 
 1599622 looks strange. Not sure if this affects performance, otherwise does 
 this have any meaning?

it’s not ideal but the loading system currently doesn’t try to second-guess 
that the WHERE clause actually means the whole subquery isn’t needed.   There’s 
another kind of loading I may add someday where instead of a subquery, we do an 
IN with the IDs we already have, that would alleviate this particular issue.  
the primary key is indexed so the lookup isn’t bad here though overall the 
subquery makes life a bit harder for the DB (much harder if you’re on MySQL).





signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [sqlalchemy] pre-populated instances

2013-11-13 Thread Michael Bayer
typically if I had “create_all()” running in my app I’d probably have a 
function like “install_payload()” right after it that puts in all the data we 
need.

actually what I usually do is a pipeline approach:

install_app() -   create_tables(), install_payload_x(), install_payload_y(), 
etc.



On Nov 11, 2013, at 7:54 AM, Richard Gerd Kuesters rich...@humantech.com.br 
wrote:

 Hi all,
 
 I have a CI that generates my app database. So far so good.
 
 I would like to tweak them a bit. So, many of these instances (like Language) 
 can have pre-populated values (en_US, pt_BR, es_ES, etc). what's the 
 recommended technique (or more close to python+SA) to accomplish this?
 
 For now, I'm using the after_create event, but it doesn't work with 
 declarative (unless Language.__table__ is used), but I would rather use the 
 natural way of creating objects lang = Language('pt_BR'); 
 session.add(lang)
 
 
 Any ideas?
 
 Kind regards,
 Richard.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/groups/opt_out.



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [sqlalchemy] Mixing ORM and set-returning-functions (e.g. generate_series)

2013-11-13 Thread Daniel Grace
I know it's been a few days, but I've finally had time to actually get back 
to this

On Thursday, November 7, 2013 6:18:57 PM UTC-8, Michael Bayer wrote:

 On Nov 7, 2013, at 6:46 PM, Daniel Grace thisgen...@gmail.comjavascript: 
 wrote:

[...]

 that’s basically equivalent.   If you want the identical syntax with that 
 function, the @compiles construct would be a start, but you’d be probably 
 making a custom FromClause subclass which is a little involved.


I'd think that a generic form of this might actually be a way to go, but 
I'm not (yet) familiar enough with sqlalchemy's internals to make a lot of 
headway in designing any 'sane' solution.

It's worth noting PostgreSQL supports column-level aliasing on anything, 
not just a set-returning-function, such that the following is valid:

CREATE TABLE foo ( id SERIAL NOT NULL);
INSERT INTO foo ( id ) VALUES (1), (2), (3);
SELECT bar.baz FROM foo AS bar(baz);

That said, it's not cleanly usable in most cases -- you can't alias a 
column by name, so you have to know the exact order columns appear in 
(which you might not know if you didn't reflect nor create the table -- a 
column definition that doesn't appear in the model might be in the table, 
for instance).

That said, it'd seem like a generic Set Returning Function implementation 
would need to do the following:
* Subclass from GenericFunction to track data types of input values and 
whatever voodoo is required for bind parameters.  (I think this also gives 
us the behavior of SELECT generate_series(...) being treated like SELECT 
FROM ... generate_series(...)... for free)
* Subclass from FromClause or perhaps even Alias to track types and names 
of output values in self.c
* Have an alternate/extended 'alias' implementation capable of defining 
aliases on a per-column level, so AS alias becomes AS alias(column_alias, 
...)

So usage might be something like:

series = sql.srfunc.generate_series(1, 10, output=[Column('value', 
Integer()])
foo = series.alias(foo).column_alias('value', 'v')
bar = foo.alias(bar)  # I'm assuming that aliasing an existing alias Does 
The Right Thing(tm), I've never tried it.

session.query(series.c.value) 
# SELECT generate_series.value FROM generate_series(1, 10) AS 
generate_series(value)

session.query(foo.c.v)
# SELECT foo.v FROM generate_series(1, 10) AS foo(v)

session.query(foo.c.v + bar.c.v).select_from(foo).join(bar, foo.c.v  
foo.c.v.) 
# SELECT foo.v + bar.v FROM generate_series(1, 10) AS foo(v) JOIN 
generate_series(1, 10) AS bar(v) ON foo.v  bar.v;



Thoughts?

-- Daniel 

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] Mixing ORM and set-returning-functions (e.g. generate_series)

2013-11-13 Thread Michael Bayer
On Nov 13, 2013, at 6:12 PM, Daniel Grace thisgenericn...@gmail.com wrote:I know it's been a few days, but I've finally had time to actually get back to thisOn Thursday, November 7, 2013 6:18:57 PM UTC-8, Michael Bayer wrote:On Nov 7, 2013, at 6:46 PM, Daniel Grace thisgen...@gmail.com wrote:[...]that’s basically equivalent.  If you want the identical syntax with that function, the @compiles construct would be a start, but you’d be probably making a custom FromClause subclass which is a little involved.I'd think that a generic form of this might actually be a way to go, but I'm not (yet) familiar enough with sqlalchemy's internals to make a lot of headway in designing any 'sane' solution.It's worth noting PostgreSQL supports column-level aliasing on anything, not just a set-returning-function, such that the following is valid:CREATE TABLE foo ( id SERIAL NOT NULL);INSERT INTO foo ( id ) VALUES (1), (2), (3);SELECT bar.baz FROM foo AS bar(baz);That said, it's not cleanly usable in most cases -- you can't alias a column by name, so you have to know the exact order columns appear in (which you might not know if you didn't reflect nor create the table -- a column definition that doesn't appear in the model might be in the table, for instance).That said, it'd seem like a generic "Set Returning Function" implementation would need to do the following:* Subclass from GenericFunction to track data types of input values and whatever voodoo is required for bind parameters. (I think this also gives us the behavior of "SELECT generate_series(...)" being treated like "SELECT FROM ... generate_series(...)..." for free)* Subclass from FromClause or perhaps even Alias to track types and names of output values in self.c* Have an alternate/extended 'alias' implementation capable of defining aliases on a per-column level, so AS alias becomes AS alias(column_alias, ...)So usage might be something like:series = sql.srfunc.generate_series(1, 10, output=[Column('value', Integer()])foo = series.alias("foo").column_alias('value', 'v')bar = foo.alias("bar") # I'm assuming that aliasing an existing alias Does The Right Thing(tm), I've never tried it.session.query(series.c.value)# SELECT generate_series.value FROM generate_series(1, 10) AS generate_series(value)session.query(foo.c.v)# SELECT foo.v FROM generate_series(1, 10) AS foo(v)session.query(foo.c.v + bar.c.v).select_from(foo).join(bar, foo.c.v  foo.c.v.)# SELECT foo.v + bar.v FROM generate_series(1, 10) AS foo(v) JOIN generate_series(1, 10) AS bar(v) ON foo.v  bar.v;see attached for demo, should get you startedfrom sqlalchemy.sql.selectable import FromClause

from sqlalchemy import select, func, Column, Integer
from sqlalchemy.ext.compiler import compiles

class PGAsFunc(FromClause):
named_with_column = True

def __init__(self, func, columns, alias=None):
self.func = func
self._column_args = columns
self.name = alias or func.name
self._from_objects = [self]

def alias(self, name):
return PGAsFunc(self.func, self.c, name)

def column_alias(self, from_, to):
pg = PGAsFunc(
self.func,
self.c,
self.name)
# this isn't really a built-in operation so klunk it through
pg.c
pg._columns[to] = pg._columns[from_]._make_proxy(pg, to)
pg._columns.remove(pg.c[from_])
return pg

def _populate_column_collection(self):
for c in self._column_args:
c._make_proxy(self)

class MyFuncThing(object):
def __getattr__(self, key):
def create_function(*args, **kw):
output = kw['output']
return PGAsFunc(getattr(func, key)(*args), output)
return create_function


@compiles(PGAsFunc)
def compile(element, compiler, **kw):
col_kwargs = kw.copy()
col_kwargs['include_table'] = False
return %s AS %s(%s) % (
compiler.process(element.func, **kw),
element.name,
, .join(
compiler.process(expr, **col_kwargs)
for expr in element.c
)
)


series = MyFuncThing().generate_series(1, 10, output=[Column('value', Integer())])
foo = series.alias(foo).column_alias('value', 'v')
bar = foo.alias(bar)  # I'm assuming that aliasing an existing alias Does The Right Thing(tm), I've never tried it.

from sqlalchemy.orm import Session
session = Session()

print session.query(series.c.value)
print session.query(foo.c.v)
print session.query(foo.c.v + bar.c.v).select_from(foo).join(bar, foo.c.v  bar.c.v)

Thoughts?-- Daniel

-- 
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.