Re: [sqlalchemy] Bulk Inserts and Unique Constraints

2014-03-24 Thread James Meneghello
Thanks for the quick reply!

This seems to work pretty well. I took out the batching (as it's already 
batched at a higher level) and modified it to suit the insertion of 
children as well (and reducded the unique to a single field) , and it 
appears to work.

with db_session() as db:
existing_entries = dict(
((entry.subject, entry) for entry in

db.query(Entry).filter(Entry.subject.in_(entries.keys())).options(subqueryload('segments')).all()
)
)

segment_inserts = []
for subject, entry in entries.items():
existing_entry = existing_entries.get(subject, None)
if existing_entry:
segments = dict(((s.segment, s) for s in 
existing_entry.segments))
for segment_number, segment in entry['segments'].items():
if int(segment_number) not in segments:
segment['entry_id'] = existing_entry.id
segment_inserts.append(segment)
else:
entry_id = engine.execute(Entry.__table__.insert(), 
entry).inserted_primary_key
for segment in entry['segments'].values():
segment['entry_id'] = entry_id[0]
segment_inserts.append(segment)

engine.execute(Segment.__table__.insert(), segment_inserts)


For 20,000 segments, this ends up being about 45 seconds and 1650 queries - 
2 to select all the entries and segments, 1 to insert the segments and the 
rest to insert parts. From here, however, I rewrote it a bit:

with db_session() as db:
existing_entries = dict(
((entry.subject, entry) for entry in

db.query(Entry).filter(Entry.subject.in_(entries.keys())).all()
)
)

entry_inserts = []
for subject, entry in entries.items():
existing_entry = existing_entries.get(subject, None)
if not existing_entry:
entry_inserts.append(entry)

engine.execute(Entry.__table__.insert(), entry_inserts)

existing_entries = dict(
((entry.subject, entry) for entry in

db.query(Entry).filter(Entry.subject.in_(entries.keys())).options(subqueryload('segments')).all()
)
)

segment_inserts = []
for subject, entry in entries.items():
existing_entry = existing_entries.get(subject, None)
if existing_entry:
segments = dict(((s.segment, s) for s in 
existing_entry.segments))
for segment_number, segment in entry['segments'].items():
if int(segment_number) not in segments:
segment['entry_id'] = existing_entry.id
segment_inserts.append(segment)
else:
log.error('i\'ve made a huge mistake')

engine.execute(Segment.__table__.insert(), segment_inserts)

This ends up being about 19 seconds, 6 queries for a clean dump, and a bit 
less if the table is already populated. Removing the unique indexes on both 
the entries and segments tables and replacing them with standard indexes 
saves about a second in a full dump, and about 6 seconds for an update. I'm 
pretty happy with where it is now, and I suspect most of the time (aside 
from the two insert calls) is being spent in Python. That said, if you have 
any tips for improvements I'd be all ears.

Thanks for the help!

On Monday, 24 March 2014 09:19:25 UTC+8, Michael Bayer wrote:


 On Mar 23, 2014, at 11:33 AM, James Meneghello 
 muro...@gmail.comjavascript: 
 wrote:

 I'm having a few issues with unique constraints and bulk inserts. The 
 software I'm writing takes data from an external source (a lot of it, 
 anywhere from 1,000 rows per minute to 100-200k+), crunches it down into 
 its hierarchy and saves it to the DB, to be aggregated in the background. 
 The function handling the initial DB save is designed to work with about 
 20-50k rows at a time - very little modification takes place, it's pretty 
 much just grabbed and thrown into the table. Obviously the amount of data 
 being saved somewhat excludes the use of the ORM in this particular table, 
 but there are a number of other tables that benefit from using the ORM. 
 Hence, the small stuff uses the ORM and the big stuff uses the Core.

 The main problem I'm having is with the initial save. The data comes in 
 unordered and sometimes contains duplicates, so there's a UniqueConstraint 
 on Entry on sub, division, created. Unfortunately, this hampers the bulk 
 insert - if there's a duplicate, it rolls back the entire insert and hence 
 the entries aren't available to be referenced by the segments later. 
 Obviously, capturing it in a try/catch would skip the whole block as well. 
 Both Entry and Segment have the same problem - there are often duplicate 
 segments. Since there's a large amount of data being pushed through it, I 
 assume it's impractical to insert the elements individually - while there's 
 only 100-200 

Re: [sqlalchemy] Bulk Inserts and Unique Constraints

2014-03-24 Thread James Meneghello
Oops, I should add - the reason I can't use an itertools counter to 
pre-assign IDs is because the table is potentially being dumped to by 
multiple scripts, which is why I have to commit the parts prior to the 
segments (since engine.execute can't return multiple insert_ids).

On Monday, 24 March 2014 14:40:52 UTC+8, James Meneghello wrote:

 Thanks for the quick reply!

 This seems to work pretty well. I took out the batching (as it's already 
 batched at a higher level) and modified it to suit the insertion of 
 children as well (and reducded the unique to a single field) , and it 
 appears to work.

 with db_session() as db:
 existing_entries = dict(
 ((entry.subject, entry) for entry in
 
 db.query(Entry).filter(Entry.subject.in_(entries.keys())).options(subqueryload('segments')).all()
 )
 )

 segment_inserts = []
 for subject, entry in entries.items():
 existing_entry = existing_entries.get(subject, None)
 if existing_entry:
 segments = dict(((s.segment, s) for s in 
 existing_entry.segments))
 for segment_number, segment in entry['segments'].items():
 if int(segment_number) not in segments:
 segment['entry_id'] = existing_entry.id
 segment_inserts.append(segment)
 else:
 entry_id = engine.execute(Entry.__table__.insert(), 
 entry).inserted_primary_key
 for segment in entry['segments'].values():
 segment['entry_id'] = entry_id[0]
 segment_inserts.append(segment)

 engine.execute(Segment.__table__.insert(), segment_inserts)


 For 20,000 segments, this ends up being about 45 seconds and 1650 queries 
 - 2 to select all the entries and segments, 1 to insert the segments and 
 the rest to insert parts. From here, however, I rewrote it a bit:

 with db_session() as db:
 existing_entries = dict(
 ((entry.subject, entry) for entry in
 
 db.query(Entry).filter(Entry.subject.in_(entries.keys())).all()
 )
 )

 entry_inserts = []
 for subject, entry in entries.items():
 existing_entry = existing_entries.get(subject, None)
 if not existing_entry:
 entry_inserts.append(entry)

 engine.execute(Entry.__table__.insert(), entry_inserts)

 existing_entries = dict(
 ((entry.subject, entry) for entry in
 
 db.query(Entry).filter(Entry.subject.in_(entries.keys())).options(subqueryload('segments')).all()
 )
 )

 segment_inserts = []
 for subject, entry in entries.items():
 existing_entry = existing_entries.get(subject, None)
 if existing_entry:
 segments = dict(((s.segment, s) for s in 
 existing_entry.segments))
 for segment_number, segment in entry['segments'].items():
 if int(segment_number) not in segments:
 segment['entry_id'] = existing_entry.id
 segment_inserts.append(segment)
 else:
 log.error('i\'ve made a huge mistake')

 engine.execute(Segment.__table__.insert(), segment_inserts)

 This ends up being about 19 seconds, 6 queries for a clean dump, and a bit 
 less if the table is already populated. Removing the unique indexes on both 
 the entries and segments tables and replacing them with standard indexes 
 saves about a second in a full dump, and about 6 seconds for an update. I'm 
 pretty happy with where it is now, and I suspect most of the time (aside 
 from the two insert calls) is being spent in Python. That said, if you have 
 any tips for improvements I'd be all ears.

 Thanks for the help!

 On Monday, 24 March 2014 09:19:25 UTC+8, Michael Bayer wrote:


 On Mar 23, 2014, at 11:33 AM, James Meneghello muro...@gmail.com wrote:

 I'm having a few issues with unique constraints and bulk inserts. The 
 software I'm writing takes data from an external source (a lot of it, 
 anywhere from 1,000 rows per minute to 100-200k+), crunches it down into 
 its hierarchy and saves it to the DB, to be aggregated in the background. 
 The function handling the initial DB save is designed to work with about 
 20-50k rows at a time - very little modification takes place, it's pretty 
 much just grabbed and thrown into the table. Obviously the amount of data 
 being saved somewhat excludes the use of the ORM in this particular table, 
 but there are a number of other tables that benefit from using the ORM. 
 Hence, the small stuff uses the ORM and the big stuff uses the Core.

 The main problem I'm having is with the initial save. The data comes in 
 unordered and sometimes contains duplicates, so there's a UniqueConstraint 
 on Entry on sub, division, created. Unfortunately, this hampers the bulk 
 insert - if there's a duplicate, it rolls back the entire insert and 

[sqlalchemy] What is the rationale of having to manually set up a relationship between two tables?

2014-03-24 Thread Bao Niu
Suppose we have two tables in an existing database, user and address. 
There is a one-to-many relationships between these two tables with a 
foreign key user.id==address_user_id.
Now we *reflect* this schema directly from the database:

from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base(E)

Class User(Base):
__tablename__ = 'user'
__table_args__ = {'autoload': True}
addresses = sqlalchemy.orm.relationship(Address,  backref=user)

Class Address(Base):
__tablename__ = 'address'
__table_args__ = {'autoload': True}

To me, it is a mystery why the almighty SA can autoload everything, but not 
an obvious relationship (in red ink). What benefit is it for the user to 
manually define such a simple and unambiguous relationship? Can't it be 
automatically done, i.e., leaving out the red part?
Will this feature likely be included in future versions, say, sqlalchemy 
1.0?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Bulk Inserts and Unique Constraints

2014-03-24 Thread Cosmia Luna
INSERT statement of postgresql supports RETURNING, read this 
http://docs.sqlalchemy.org/en/rel_0_8/core/dml.html#sqlalchemy.sql.expression.Insert.returning

On Monday, March 24, 2014 2:43:46 PM UTC+8, James Meneghello wrote:

 Oops, I should add - the reason I can't use an itertools counter to 
 pre-assign IDs is because the table is potentially being dumped to by 
 multiple scripts, which is why I have to commit the parts prior to the 
 segments (since engine.execute can't return multiple insert_ids).

 On Monday, 24 March 2014 14:40:52 UTC+8, James Meneghello wrote:

 Thanks for the quick reply!

 This seems to work pretty well. I took out the batching (as it's already 
 batched at a higher level) and modified it to suit the insertion of 
 children as well (and reducded the unique to a single field) , and it 
 appears to work.

 with db_session() as db:
 existing_entries = dict(
 ((entry.subject, entry) for entry in
 
 db.query(Entry).filter(Entry.subject.in_(entries.keys())).options(subqueryload('segments')).all()
 )
 )

 segment_inserts = []
 for subject, entry in entries.items():
 existing_entry = existing_entries.get(subject, None)
 if existing_entry:
 segments = dict(((s.segment, s) for s in 
 existing_entry.segments))
 for segment_number, segment in entry['segments'].items():
 if int(segment_number) not in segments:
 segment['entry_id'] = existing_entry.id
 segment_inserts.append(segment)
 else:
 entry_id = engine.execute(Entry.__table__.insert(), 
 entry).inserted_primary_key
 for segment in entry['segments'].values():
 segment['entry_id'] = entry_id[0]
 segment_inserts.append(segment)

 engine.execute(Segment.__table__.insert(), segment_inserts)


 For 20,000 segments, this ends up being about 45 seconds and 1650 queries 
 - 2 to select all the entries and segments, 1 to insert the segments and 
 the rest to insert parts. From here, however, I rewrote it a bit:

 with db_session() as db:
 existing_entries = dict(
 ((entry.subject, entry) for entry in
 
 db.query(Entry).filter(Entry.subject.in_(entries.keys())).all()
 )
 )

 entry_inserts = []
 for subject, entry in entries.items():
 existing_entry = existing_entries.get(subject, None)
 if not existing_entry:
 entry_inserts.append(entry)

 engine.execute(Entry.__table__.insert(), entry_inserts)

 existing_entries = dict(
 ((entry.subject, entry) for entry in
 
 db.query(Entry).filter(Entry.subject.in_(entries.keys())).options(subqueryload('segments')).all()
 )
 )

 segment_inserts = []
 for subject, entry in entries.items():
 existing_entry = existing_entries.get(subject, None)
 if existing_entry:
 segments = dict(((s.segment, s) for s in 
 existing_entry.segments))
 for segment_number, segment in entry['segments'].items():
 if int(segment_number) not in segments:
 segment['entry_id'] = existing_entry.id
 segment_inserts.append(segment)
 else:
 log.error('i\'ve made a huge mistake')

 engine.execute(Segment.__table__.insert(), segment_inserts)

 This ends up being about 19 seconds, 6 queries for a clean dump, and a 
 bit less if the table is already populated. Removing the unique indexes on 
 both the entries and segments tables and replacing them with standard 
 indexes saves about a second in a full dump, and about 6 seconds for an 
 update. I'm pretty happy with where it is now, and I suspect most of the 
 time (aside from the two insert calls) is being spent in Python. That said, 
 if you have any tips for improvements I'd be all ears.

 Thanks for the help!

 On Monday, 24 March 2014 09:19:25 UTC+8, Michael Bayer wrote:


 On Mar 23, 2014, at 11:33 AM, James Meneghello muro...@gmail.com 
 wrote:

 I'm having a few issues with unique constraints and bulk inserts. The 
 software I'm writing takes data from an external source (a lot of it, 
 anywhere from 1,000 rows per minute to 100-200k+), crunches it down into 
 its hierarchy and saves it to the DB, to be aggregated in the background. 
 The function handling the initial DB save is designed to work with about 
 20-50k rows at a time - very little modification takes place, it's pretty 
 much just grabbed and thrown into the table. Obviously the amount of data 
 being saved somewhat excludes the use of the ORM in this particular table, 
 but there are a number of other tables that benefit from using the ORM. 
 Hence, the small stuff uses the ORM and the big stuff uses the Core.

 The main problem I'm having is with the initial save. The 

Re: [sqlalchemy] single table hierarchy: disable polymorphism and load all as base class

2014-03-24 Thread Pau Tallada
No, polymorphic_on=None does not have any effect :(

I'll keep trying :P

Thanks!

Pau.


2014-03-17 18:56 GMT+01:00 Michael Bayer mike...@zzzcomputing.com:

 er, guessing, polymorphic_on=None also for that class? not sure of the
 mapper’s tolerance.


 On Mar 17, 2014, at 12:10 PM, Pau Tallada tall...@pic.es wrote:

 Hi, I tried this, but then it selects WHERE table.type IN (NULL) :(

 class NonPolymorphicClass(MyBaseClass):
 __mapper_args__ = {
 'concrete' : True
 }

 I'll try some more things, like overriding the __new__ method on the base
 class, to create subclasses if they are available, and instances of the
 base class if not.

 Thank you very much anyway :)

 Pau.


 2014-03-17 15:44 GMT+01:00 Michael Bayer mike...@zzzcomputing.com:

 maybe make a subclass of the polymorphic base, and just put __concrete__
 = True in the mapper_args.skip the non primary part.




 On Mar 17, 2014, at 8:45 AM, Pau Tallada tall...@pic.es wrote:

  Hi,

 Can it be done in declarative?
 I've tried several ways, but I cannot find a working one :/
 Basically, I think I need to redefine some self-referential
 relationships, as they link against the polymorphic class.


 Meta = declarative_meta()

 class NonPolymorphicClass(MyBaseClass):
 __mapper_args__ = {
 'non_primary' : True
 }

 == ArgumentError: Inheritance of non-primary mapper for class '
 NonPolymorphicClass' is only allowed from a non-primary mapper




 class NonPolymorphicClass(MyBaseClass):
 __table__ = MyBaseClass.__table__
 __mapper_args__ = {
 'non_primary' : True
 }

 == ArgumentError: Inheritance of non-primary mapper for class '
 NonPolymorphicClass' is only allowed from a non-primary mapper




 class NonPolymorphicClass(Meta):
 __mapper_args__ = {
 'non_primary' : True
 }

 == InvalidRequestError: Class class '__main__.NonPolymorphicClass'
 does not have a __table__ or __tablename__ specified and does not inherit
 from an existing table-mapped class.




 class NonPolymorphicClass(Meta):
 __mapper_args__ = {
 'non_primary' : True
 }

 == InvalidRequestError: Class class '__main__.NonPolymorphicClass'
 has no primary mapper configured. Configure a primary mapper first before
 setting up a non primary Mapper.


 2014-03-17 13:09 GMT+01:00 Pau Tallada tall...@pic.es:

 Great!

 I think it works for my needs :D

 Thank you very much!

 Pau.


 2014-03-17 12:28 GMT+01:00 Michael Bayer mike...@zzzcomputing.com:


  On Mar 17, 2014, at 6:48 AM, Pau Tallada tall...@pic.es wrote:

 Hi everyone :)

 I have an application with some kind of dynamic model.
 Several plugins define subclasses of a common base class and override
 their behaviour, but they do not change any column.

 In some scenarios, not all the puglins are loaded, so not all the
 subclasses are present/defined in the Metadata, but I still need to query
 those instances and access their columns.

 Is there any option/feature/trick to temporarily disable polymorphism
 in a single query (or globally for a session or engine) and load all
 classes as instances of the base class?



 you can use a non primary mapper:

 m = mapper(MyBase, my_base_table, non_primary=True)

 s.query(m).all()




 The other option is to hack some kind of on-demand polymorphism myself
 into the __new__ method of the Base class.

 Thanks!

 Pau.

 --
 --
 Pau Tallada Crespí
 Dep. d'Astrofísica i Cosmologia
 Port d'Informació Científica (PIC)
 Tel: +34 93 586 8233
 --


 --
 You received this message because you are subscribed to the Google
 Groups sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/d/optout.



 --
 You received this message because you are subscribed to the Google
 Groups sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/d/optout.




 --
 --
 Pau Tallada Crespí
 Dep. d'Astrofísica i Cosmologia
 Port d'Informació Científica (PIC)
 Tel: +34 93 586 8233
 --




 --
 --
 Pau Tallada Crespí
 Dep. d'Astrofísica i Cosmologia
 Port d'Informació Científica (PIC)
 Tel: +34 93 586 8233
 --


 --
 You received this message because you are subscribed to the Google Groups
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to 

Re: [sqlalchemy] What is the rationale of having to manually set up a relationship between two tables?

2014-03-24 Thread Simon King
On Mon, Mar 24, 2014 at 6:48 AM, Bao Niu niuba...@gmail.com wrote:
 Suppose we have two tables in an existing database, user and address.
 There is a one-to-many relationships between these two tables with a foreign
 key user.id==address_user_id.
 Now we *reflect* this schema directly from the database:

 from sqlalchemy.ext.declarative import declarative_base
 Base = declarative_base(E)

 Class User(Base):
 __tablename__ = 'user'
 __table_args__ = {'autoload': True}
 addresses = sqlalchemy.orm.relationship(Address,  backref=user)

 Class Address(Base):
 __tablename__ = 'address'
 __table_args__ = {'autoload': True}

 To me, it is a mystery why the almighty SA can autoload everything, but not
 an obvious relationship (in red ink). What benefit is it for the user to
 manually define such a simple and unambiguous relationship? Can't it be
 automatically done, i.e., leaving out the red part?
 Will this feature likely be included in future versions, say, sqlalchemy
 1.0?


You might be interested in the experimental automap extension:

  http://docs.sqlalchemy.org/en/rel_0_9/orm/extensions/automap.html

Hope that helps,

Simon

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] after_delete callback doesn't being called with query(...).delete() syntax

2014-03-24 Thread Igor Loskutoff
Hey, 

I'm faced issue with after_delete callback call. 
It works when I do

y = DBSession.query(Yarns).filter_by(user_id=user.id, slug=yarn_id).first()
 DBSession.delete(y)
 DBSession.flush()


and doesn't being called on 

DBSession.query(Yarns).filter_by(user_id=user.id, slug=yarn_id).delete() 

 
where DBSession = 
scoped_session(sessionmaker(extension=ZopeTransactionExtension()))

Callback itself is being set like that: sqlalchemy.event.listen(Yarns, 
'after_delete', yarn_index_remove)

Model class looks like that:

class Yarns(sqlalchemy.ext.declarative.declarative_base()):  
...

I'd really like use more succinct first syntax instead of current.

Thanks,
Igor

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] after_delete callback doesn't being called with query(...).delete() syntax

2014-03-24 Thread Simon King
On Mon, Mar 24, 2014 at 1:05 PM, Igor Loskutoff
igor.loskut...@gmail.com wrote:
 Hey,

 I'm faced issue with after_delete callback call.
 It works when I do

 y = DBSession.query(Yarns).filter_by(user_id=user.id,
 slug=yarn_id).first()
 DBSession.delete(y)
 DBSession.flush()


 and doesn't being called on

 DBSession.query(Yarns).filter_by(user_id=user.id, slug=yarn_id).delete()


 where DBSession =
 scoped_session(sessionmaker(extension=ZopeTransactionExtension()))

 Callback itself is being set like that: sqlalchemy.event.listen(Yarns,
 'after_delete', yarn_index_remove)

 Model class looks like that:

 class Yarns(sqlalchemy.ext.declarative.declarative_base()):
 ...

 I'd really like use more succinct first syntax instead of current.

 Thanks,
 Igor

I think the problem is that after_delete is only used for instances
which have actually been loaded from the DB. Query.delete() doesn't
load the matching rows from the DB, so the event doesn't fire.

There is also a (session, rather than mapper) event called
after_bulk_delete which you might be able to use:

  
http://docs.sqlalchemy.org/en/rel_0_9/orm/events.html#sqlalchemy.orm.events.SessionEvents.after_bulk_delete

You'd probably have to examine the delete_context.query object to see
which class is being deleted, and figuring out exactly which rows have
been deleted might also be a problem (since the filter conditions
passed to the query could be arbitrarily complicated).

Simon

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] after_delete callback doesn't being called with query(...).delete() syntax

2014-03-24 Thread Michael Bayer

On Mar 24, 2014, at 9:05 AM, Igor Loskutoff igor.loskut...@gmail.com wrote:

 Hey, 
 
 I'm faced issue with after_delete callback call. 
 It works when I do
 
 y = DBSession.query(Yarns).filter_by(user_id=user.id, slug=yarn_id).first()
 DBSession.delete(y)
 DBSession.flush()
 
 and doesn't being called on 
 
 DBSession.query(Yarns).filter_by(user_id=user.id, slug=yarn_id).delete() 
  
 where DBSession = 
 scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
 
 Callback itself is being set like that: sqlalchemy.event.listen(Yarns, 
 'after_delete', yarn_index_remove)
 
 Model class looks like that:
 
 class Yarns(sqlalchemy.ext.declarative.declarative_base()):  
 ...
 
 I'd really like use more succinct first syntax instead of current

after_delete is only called for unit-of-work deletes.   The aggregate 
delete() doesn't have the same access to object instances that the UOW does as 
it may match any number (or no) objects, none of which may be locally present.  
To catch those events use after_bulk_delete().

http://docs.sqlalchemy.org/en/rel_0_9/orm/events.html?highlight=after_bulk_delete#sqlalchemy.orm.events.SessionEvents.after_bulk_delete



 .
 
 Thanks,
 Igor
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Bulk Inserts and Unique Constraints

2014-03-24 Thread Michael Bayer
RETURNING doesn't work with DBAPI's executemany style of execution, however, 
which is what conn.execute(stmt, [list of parameter sets]) calls.



On Mar 24, 2014, at 5:33 AM, Cosmia Luna cosm...@gmail.com wrote:

 INSERT statement of postgresql supports RETURNING, read this 
 http://docs.sqlalchemy.org/en/rel_0_8/core/dml.html#sqlalchemy.sql.expression.Insert.returning
 
 On Monday, March 24, 2014 2:43:46 PM UTC+8, James Meneghello wrote:
 Oops, I should add - the reason I can't use an itertools counter to 
 pre-assign IDs is because the table is potentially being dumped to by 
 multiple scripts, which is why I have to commit the parts prior to the 
 segments (since engine.execute can't return multiple insert_ids).
 
 On Monday, 24 March 2014 14:40:52 UTC+8, James Meneghello wrote:
 Thanks for the quick reply!
 
 This seems to work pretty well. I took out the batching (as it's already 
 batched at a higher level) and modified it to suit the insertion of children 
 as well (and reducded the unique to a single field) , and it appears to work.
 
 with db_session() as db:
 existing_entries = dict(
 ((entry.subject, entry) for entry in
 
 db.query(Entry).filter(Entry.subject.in_(entries.keys())).options(subqueryload('segments')).all()
 )
 )
 
 segment_inserts = []
 for subject, entry in entries.items():
 existing_entry = existing_entries.get(subject, None)
 if existing_entry:
 segments = dict(((s.segment, s) for s in existing_entry.segments))
 for segment_number, segment in entry['segments'].items():
 if int(segment_number) not in segments:
 segment['entry_id'] = existing_entry.id
 segment_inserts.append(segment)
 else:
 entry_id = engine.execute(Entry.__table__.insert(), 
 entry).inserted_primary_key
 for segment in entry['segments'].values():
 segment['entry_id'] = entry_id[0]
 segment_inserts.append(segment)
 
 engine.execute(Segment.__table__.insert(), segment_inserts)
 
 For 20,000 segments, this ends up being about 45 seconds and 1650 queries - 2 
 to select all the entries and segments, 1 to insert the segments and the rest 
 to insert parts. From here, however, I rewrote it a bit:
 
 with db_session() as db:
 existing_entries = dict(
 ((entry.subject, entry) for entry in
 
 db.query(Entry).filter(Entry.subject.in_(entries.keys())).all()
 )
 )
 
 entry_inserts = []
 for subject, entry in entries.items():
 existing_entry = existing_entries.get(subject, None)
 if not existing_entry:
 entry_inserts.append(entry)
 
 engine.execute(Entry.__table__.insert(), entry_inserts)
 
 existing_entries = dict(
 ((entry.subject, entry) for entry in
 
 db.query(Entry).filter(Entry.subject.in_(entries.keys())).options(subqueryload('segments')).all()
 )
 )
 
 segment_inserts = []
 for subject, entry in entries.items():
 existing_entry = existing_entries.get(subject, None)
 if existing_entry:
 segments = dict(((s.segment, s) for s in 
 existing_entry.segments))
 for segment_number, segment in entry['segments'].items():
 if int(segment_number) not in segments:
 segment['entry_id'] = existing_entry.id
 segment_inserts.append(segment)
 else:
 log.error('i\'ve made a huge mistake')
 
 engine.execute(Segment.__table__.insert(), segment_inserts)
 
 This ends up being about 19 seconds, 6 queries for a clean dump, and a bit 
 less if the table is already populated. Removing the unique indexes on both 
 the entries and segments tables and replacing them with standard indexes 
 saves about a second in a full dump, and about 6 seconds for an update. I'm 
 pretty happy with where it is now, and I suspect most of the time (aside from 
 the two insert calls) is being spent in Python. That said, if you have any 
 tips for improvements I'd be all ears.
 
 Thanks for the help!
 
 On Monday, 24 March 2014 09:19:25 UTC+8, Michael Bayer wrote:
 
 On Mar 23, 2014, at 11:33 AM, James Meneghello muro...@gmail.com wrote:
 
 I'm having a few issues with unique constraints and bulk inserts. The 
 software I'm writing takes data from an external source (a lot of it, 
 anywhere from 1,000 rows per minute to 100-200k+), crunches it down into its 
 hierarchy and saves it to the DB, to be aggregated in the background. The 
 function handling the initial DB save is designed to work with about 20-50k 
 rows at a time - very little modification takes place, it's pretty much just 
 grabbed and thrown into the table. Obviously the amount of data being saved 
 somewhat excludes the use of the ORM in this 

[sqlalchemy] Re: Bulk Inserts and Unique Constraints

2014-03-24 Thread James Meneghello
That's effectively what I'm doing now. I'm not sure there's much I can 
speed up at this point - the SELECTs take about 0.05s, it's just the 
INSERTs taking a bulk of the time - 11-15s depending on the number of rows. 
That said, I'm still running on development and there'll be a significant 
boost once it's on proper hardware.

On Monday, 24 March 2014 22:44:09 UTC+8, Jonathan Vanasco wrote:

 The data comes in unordered and sometimes contains duplicates, so there's 
 a UniqueConstraint on Entry on sub, division, created.

 Have you tried pre-processing the list first ?

 I've had similar situations, when dealing with browser , user and app 
 analytics.  

 I normally do a first pass to restructure the raw log file and note any 
 'selects' i might need to associate the records to; then I lock tables, 
 precache the selects, and do all the inserts.  the speed pickups have been 
 great.


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Re: Bulk Inserts and Unique Constraints

2014-03-24 Thread Jonathan Vanasco
The data comes in unordered and sometimes contains duplicates, so there's 
a UniqueConstraint on Entry on sub, division, created.

Have you tried pre-processing the list first ?

I've had similar situations, when dealing with browser , user and app 
analytics.  

I normally do a first pass to restructure the raw log file and note any 
'selects' i might need to associate the records to; then I lock tables, 
precache the selects, and do all the inserts.  the speed pickups have been 
great.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Re: What is the rationale of having to manually set up a relationship between two tables?

2014-03-24 Thread Jonathan Vanasco
That's a really simple use case -- you want the primary key to be a 1:1 
relationship, and you don't have potentially competing fields.

But this gets complicated when:

- You want User.addresses to be a single item ( User.address )
- You don't want User.addresses to just be all the addresses -- but to be 
filtered on is_current_address = True
- User has a column name called 'address' or 'addresses'; the join would 
want to be named email_addresses or postal_address  .

It's not that SqlAlchemy can't do this stuff -- it's that it chooses not 
to, so you have full control of your schema.  That's why it's in the 
extensions, not the core.



-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Re: Bulk Inserts and Unique Constraints

2014-03-24 Thread Jonathan Vanasco
Since you're using Postgres... have you considered using python to generate 
a COPY file ?

Sqlalchemy doesn't seem to support it natively... maybe via 'text', but 
your underlying psycopg2 driver does.

it's way way way faster.  i've found it significantly faster than dropping 
fkeys and using prepared statements.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] joinedload()

2014-03-24 Thread Philip Scott
Hello all,

I am trying to optimize a query I have, and have noticed some strange
behavior. I have a table called 'Trade'. A trade may optionally have a
commission, which itself is a trade. This is the core of it:


class Trade(Base, CCPTableMixin):
__tablename__= trade

id   = Column(Integer, primary_key=True)

trade_commission = Table(trade_commission,
Base.metadata,
Column(id_trade_victim, Integer, ForeignKey(id),
primary_key=True),
Column(id_trade_cash, Integer, ForeignKey(id),
primary_key=True))

commission  = relationship(trade_class_name,
   secondary=trade_commission,
   primaryjoin=(id ==
trade_commission.c.id_trade_victim),
   secondaryjoin=(id ==
trade_commission.c.id_trade_cash),
   uselist=False,
   backref=backref(commission_for,
uselist=False))


Now, in a particular query, if a trade happens to be a commission trade, I
wish to eagerly load the original 'victim' trade, so I have something like
this:

session().query(Trade)
query = query.options(joinedload(commission_for))

I noticed things were going a bit slowly, and to my surprise when I turned
on query tracing I could see many individual queries. It turns out that the
joined loading was working in general, however, if 'commission_for' was
None (e.g. the trade is not a commission trade) SQLAlchemy was issuing a
query to re-populate the (empty) collection.

Interestingly, when I switched it to 'subqueryload' - it didn't do this and
correctly eagerly loaded the empty collection. However, I have quite an
expensive query to select the list of trades in the first place which I
would rather not do twice, which is required for a subquery eager load. (It
would be really cool if you could select the primary keys into a temporary
table or something when doing a subquery load, but I guess that would be
quite difficult to do in a DB independant way)

Is this a bug, or perhaps some expected side effect of the joined load?

I am using SQLAlchemy 0.9.3. If it help I could knock up a working example
in short order.

Cheers, and keep up the fine work!

- Philip

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] joinedload()

2014-03-24 Thread Michael Bayer

On Mar 24, 2014, at 2:09 PM, Philip Scott safetyfirstp...@gmail.com wrote:

 
 Is this a bug, or perhaps some expected side effect of the joined load?

seemed like something that might be possible but the scalar loader is 
initializing the attribute to None to start with, here's a simple test that 
doesn't show your behavior, so see if you can just modify this one to show what 
you are seeing.  note we only need to see that 'bs' is in a1.__dict__ to 
prevent a lazyload.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class A(Base):
__tablename__ = 'a'

id = Column(Integer, primary_key=True)
bs = relationship(B, secondary=Table('atob', Base.metadata,
Column('aid', ForeignKey('a.id')),
Column('bid', ForeignKey('b.id'))
),
uselist=False)

class B(Base):
__tablename__ = 'b'

id = Column(Integer, primary_key=True)

e = create_engine(sqlite://, echo=True)
Base.metadata.create_all(e)

sess = Session(e)
sess.add(A())
sess.commit()
sess.close()

a1 = sess.query(A).options(joinedload(bs)).first()
assert 'bs' in a1.__dict__
assert a1.__dict__['bs'] is None
assert a1.bs is None



-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] What is the rationale of having to manually set up a relationship between two tables?

2014-03-24 Thread Bao Niu
Compare to SQLsoup, is this extension more recommended?


On Mon, Mar 24, 2014 at 3:32 AM, Simon King si...@simonking.org.uk wrote:

 On Mon, Mar 24, 2014 at 6:48 AM, Bao Niu niuba...@gmail.com wrote:
  Suppose we have two tables in an existing database, user and address.
  There is a one-to-many relationships between these two tables with a
 foreign
  key user.id==address_user_id.
  Now we *reflect* this schema directly from the database:
 
  from sqlalchemy.ext.declarative import declarative_base
  Base = declarative_base(E)
 
  Class User(Base):
  __tablename__ = 'user'
  __table_args__ = {'autoload': True}
  addresses = sqlalchemy.orm.relationship(Address,  backref=user)
 
  Class Address(Base):
  __tablename__ = 'address'
  __table_args__ = {'autoload': True}
 
  To me, it is a mystery why the almighty SA can autoload everything, but
 not
  an obvious relationship (in red ink). What benefit is it for the user to
  manually define such a simple and unambiguous relationship? Can't it be
  automatically done, i.e., leaving out the red part?
  Will this feature likely be included in future versions, say, sqlalchemy
  1.0?
 

 You might be interested in the experimental automap extension:

   http://docs.sqlalchemy.org/en/rel_0_9/orm/extensions/automap.html

 Hope that helps,

 Simon

 --
 You received this message because you are subscribed to a topic in the
 Google Groups sqlalchemy group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/sqlalchemy/_q1tBBfTh0w/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.