Re: [sqlalchemy] Polymorphic selectin bug which incorrectly caches queries

2018-06-24 Thread vineet
Oops, missed the Benchling reference in there. Thanks!

On Sunday, June 24, 2018 at 7:09:51 PM UTC-7, Mike Bayer wrote:
>
> other than the imports and missing database setup I had to re-write  ( 
> :) ), great test!   issue is logged at 
>
> https://bitbucket.org/zzzeek/sqlalchemy/issues/4286/mapper_subclass_load_via_in-returns
>  
> where I'll try to figure out a decent way to fix. 
>
> On Sun, Jun 24, 2018 at 6:30 PM,  > 
> wrote: 
> > Hello! We noticed some weird behavior where our tests were affecting one 
> > another, and I tracked it down to what I believe is a SQLAlchemy bug. It 
> > appears that performing a particular query with a joinedload can alter 
> > future queries that are run without the same joinedload. This appears to 
> > only happen when using 'polymorphic_load': 'selectin' (and not with 
> > 'inline'): 
> > 
> > from benchling import db 
> > 
> > ### Table setup 
> > class Parent(db.Model): 
> > __tablename__ = 'parent' 
> > id = db.Column(db.Integer, primary_key=True) 
> > 
> > 
> > class Child(db.Model): 
> > __tablename__ = 'child' 
> > id = db.Column(db.Integer, primary_key=True) 
> > parent_id = db.Column(db.Integer, db.ForeignKey('parent.id')) 
> > parent = db.relationship('Parent', backref=db.backref('children')) 
> > 
> > type = db.Column(db.String(), nullable=False) 
> > __mapper_args__ = { 
> > 'polymorphic_on': type, 
> > } 
> > 
> > class ChildSubclass1(Child): 
> > __tablename__ = 'child_subclass1' 
> > id = db.Column(db.Integer, db.ForeignKey('child.id'), 
> primary_key=True) 
> > __mapper_args__ = {'polymorphic_identity': 'subclass1', 
> > 'polymorphic_load': 'selectin'} 
> > 
> > 
> > class Other(db.Model): 
> > id = db.Column(db.Integer, primary_key=True) 
> > child_subclass_id = db.Column(db.Integer, 
> > db.ForeignKey('child_subclass1.id')) 
> > child_subclass = db.relationship('ChildSubclass1', 
> > backref=db.backref('others')) 
> > 
> > Parent.__table__.create(bind=db.engine) 
> > Child.__table__.create(bind=db.engine) 
> > ChildSubclass1.__table__.create(bind=db.engine) 
> > Other.__table__.create(bind=db.engine) 
> > 
> > 
> > def is_other_in_session(): 
> > return any(isinstance(model, Other) for model in db.session) 
> > 
> > 
> > ### Bug repro: 
> > parent = Parent() 
> > subclass1 = ChildSubclass1(parent=parent) 
> > other = Other(child_subclass=subclass1) 
> > db.session.add_all([parent, subclass1, other]) 
> > db.session.commit() 
> > 
> > 
> > # Test 1: Loading Parent + Children does not load Other 
> > print '\nTest #1: Loading Parent + Children does not load others' 
> > db.session.expunge_all() 
> > assert not is_other_in_session() 
> > parents = 
> > 
> Parent.query.options(db.joinedload(Parent.children.of_type(ChildSubclass1))).all()
>  
>
> > assert not is_other_in_session() 
> > 
> > print '\nTest #2: Loading Parent + Children + Others loads others as 
> > expected' 
> > db.session.expunge_all() 
> > assert not is_other_in_session() 
> > parents = 
> > 
> Parent.query.options(db.joinedload(Parent.children.of_type(ChildSubclass1)).joinedload(ChildSubclass1.others)).all()
>  
>
> > assert is_other_in_session() 
> > 
> > print '\nTest #3: Loading Parent + Children should not load others, but 
> it 
> > does now' 
> > db.session.expunge_all() 
> > assert not is_other_in_session() 
> > parents = 
> > 
> Parent.query.options(db.joinedload(Parent.children.of_type(ChildSubclass1))).all()
>  
>
> > assert not is_other_in_session(), 'The above query should not load any 
> > Others!' 
> > 
> > Test #1 and Test #3 are identical, but Test #3 fails because it is run 
> after 
> > Test #2 (which does a joinedload on ChildSubclass.others). I thought 
> this 
> > might be related to baked queries, but it still fails with 
> > enable_baked_queries=False . Let me know if there's any other 
> information I 
> > can provide that would be useful, or if there are any workarounds we can 
> use 
> > right now! 
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to sqlalchemy+...@googlegroups.com . 
> > To post to this group, send email to sqlal...@googlegroups.com 
> . 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--

[sqlalchemy] Polymorphic selectin bug which incorrectly caches queries

2018-06-24 Thread vineet
Hello! We noticed some weird behavior where our tests were affecting one 
another, and I tracked it down to what I believe is a SQLAlchemy bug. It 
appears that performing a particular query with a joinedload can alter 
future queries that are run without the same joinedload. This appears to 
only happen when using 'polymorphic_load': 'selectin' (and not with 
'inline'):

from benchling import db

### Table setup
class Parent(db.Model):
__tablename__ = 'parent'
id = db.Column(db.Integer, primary_key=True)


class Child(db.Model):
__tablename__ = 'child'
id = db.Column(db.Integer, primary_key=True)
parent_id = db.Column(db.Integer, db.ForeignKey('parent.id'))
parent = db.relationship('Parent', backref=db.backref('children'))

type = db.Column(db.String(), nullable=False)
__mapper_args__ = {
'polymorphic_on': type,
}

class ChildSubclass1(Child):
__tablename__ = 'child_subclass1'
id = db.Column(db.Integer, db.ForeignKey('child.id'), primary_key=True)
__mapper_args__ = {'polymorphic_identity': 'subclass1', 
'polymorphic_load': 'selectin'}


class Other(db.Model):
id = db.Column(db.Integer, primary_key=True)
child_subclass_id = db.Column(db.Integer, 
db.ForeignKey('child_subclass1.id'))
child_subclass = db.relationship('ChildSubclass1', 
backref=db.backref('others'))

Parent.__table__.create(bind=db.engine)
Child.__table__.create(bind=db.engine)
ChildSubclass1.__table__.create(bind=db.engine)
Other.__table__.create(bind=db.engine)


def is_other_in_session():
return any(isinstance(model, Other) for model in db.session)


### Bug repro:
parent = Parent()
subclass1 = ChildSubclass1(parent=parent)
other = Other(child_subclass=subclass1)
db.session.add_all([parent, subclass1, other])
db.session.commit()


# Test 1: Loading Parent + Children does not load Other
print '\nTest #1: Loading Parent + Children does not load others'
db.session.expunge_all()
assert not is_other_in_session()
parents = 
Parent.query.options(db.joinedload(Parent.children.of_type(ChildSubclass1))).all()
assert not is_other_in_session()

print '\nTest #2: Loading Parent + Children + Others loads others as 
expected'
db.session.expunge_all()
assert not is_other_in_session()
parents = 
Parent.query.options(db.joinedload(Parent.children.of_type(ChildSubclass1)).joinedload(ChildSubclass1.others)).all()
assert is_other_in_session()

print '\nTest #3: Loading Parent + Children should not load others, but it 
does now'
db.session.expunge_all()
assert not is_other_in_session()
parents = 
Parent.query.options(db.joinedload(Parent.children.of_type(ChildSubclass1))).all()
assert not is_other_in_session(), 'The above query should not load any 
Others!'

Test #1 and Test #3 are identical, but Test #3 fails because it is run 
after Test #2 (which does a joinedload on ChildSubclass.others). I thought 
this might be related to baked queries, but it still fails 
with enable_baked_queries=False . Let me know if there's any other 
information I can provide that would be useful, or if there are any 
workarounds we can use right now!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Non-deterministic behavior when setting relationships (1.2 regression)

2018-01-24 Thread vineet
Thanks so much Mike! I appreciate and am always amazed by your helpfulness 
and responsiveness.

On Wednesday, January 24, 2018 at 6:36:24 PM UTC-8, Mike Bayer wrote:
>
> 1.2.2 is released w/ this fix 
>
> On Wed, Jan 24, 2018 at 10:16 AM, Mike Bayer  > wrote: 
> > the plan is to fix and release by EOD, thank you 
> > 
> > 
> https://bitbucket.org/zzzeek/sqlalchemy/issues/4171/list-assignment-does-not-seem-to-be
>  
> > 
> > 
> > On Wed, Jan 24, 2018 at 1:13 AM,  > 
> wrote: 
> >> Hello! We're currently in the process of upgrading from SQLAlchemy 
> 1.1.11 to 
> >> 1.2.1 (we're super excited about selectin!), and noticed some 
> >> non-deterministic behavior in one of our tests. Here is the minimal 
> repro 
> >> that I was able to find: 
> >> 
> >> class A(db.Model): 
> >> __tablename__ = 'a' 
> >> id = db.Column(db.Integer, primary_key=True) 
> >> 
> >> class B(db.Model): 
> >> __tablename__ = 'b' 
> >> id = db.Column(db.Integer, primary_key=True) 
> >> a_id = db.Column(db.Integer, db.ForeignKey('a.id')) 
> >> a = db.relationship('A', backref=db.backref('bs')) 
> >> 
> >> 
> >> for i in xrange(100): 
> >> a1 = A() 
> >> a2 = A() 
> >> 
> >> b = B(a=a1)  # Set B.a = a1 
> >> a2.bs = [b]  # ... but then put b onto a2's list 
> >> 
> >> db.session.add_all([a1, a2]) 
> >> db.session.commit() 
> >> 
> >> print len(a1.bs) 
> >> 
> >> I expect "len(a1.bs)" to always be 0 (and this matches the behavior in 
> >> SQLAlchemy 1.1). However, in SQLAlchemy 1.2, the length is sometimes 1 
> and 
> >> sometimes 0. Empirically, it looks like it's 1 around 20% of the time. 
> This 
> >> is only the case if a1 and a2 are both pending - adding a commit before 
> >> creating B() makes the non-deterministic behavior go away. 
> >> 
> >> Let me know if there's any more information that I can provide here 
> that 
> >> would be helpful, happy to help address this in any way I can! 
> >> 
> >> -- 
> >> SQLAlchemy - 
> >> The Python SQL Toolkit and Object Relational Mapper 
> >> 
> >> http://www.sqlalchemy.org/ 
> >> 
> >> To post example code, please provide an MCVE: Minimal, Complete, and 
> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> >> description. 
> >> --- 
> >> You received this message because you are subscribed to the Google 
> Groups 
> >> "sqlalchemy" group. 
> >> To unsubscribe from this group and stop receiving emails from it, send 
> an 
> >> email to sqlalchemy+...@googlegroups.com . 
> >> To post to this group, send email to sqlal...@googlegroups.com 
> . 
> >> Visit this group at https://groups.google.com/group/sqlalchemy. 
> >> For more options, visit https://groups.google.com/d/optout. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Non-deterministic behavior when setting relationships (1.2 regression)

2018-01-23 Thread vineet
Hello! We're currently in the process of upgrading from SQLAlchemy 1.1.11 
to 1.2.1 (we're super excited about selectin!), and noticed some 
non-deterministic behavior in one of our tests. Here is the minimal repro 
that I was able to find:

class A(db.Model):
__tablename__ = 'a'
id = db.Column(db.Integer, primary_key=True)

class B(db.Model):
__tablename__ = 'b'
id = db.Column(db.Integer, primary_key=True)
a_id = db.Column(db.Integer, db.ForeignKey('a.id'))
a = db.relationship('A', backref=db.backref('bs'))


for i in xrange(100):
a1 = A()
a2 = A()

b = B(a=a1)  # Set B.a = a1
a2.bs = [b]  # ... but then put b onto a2's list

db.session.add_all([a1, a2])
db.session.commit()

print len(a1.bs)

I expect "len(a1.bs)" to always be 0 (and this matches the behavior in 
SQLAlchemy 1.1). However, in SQLAlchemy 1.2, the length is *sometimes 1 and 
sometimes 0*. Empirically, it looks like it's 1 around 20% of the time. 
This is only the case if a1 and a2 are both pending - adding a commit 
before creating B() makes the non-deterministic behavior go away.

Let me know if there's any more information that I can provide here that 
would be helpful, happy to help address this in any way I can!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-11 Thread vineet
I ran some tests using raw psycopg2 on my local computer, and also ran some 
tests with SQLAlchemy. I misunderstood a few things in my tests above, but 
I've explained the tests below and think they are more accurate.

*Context:*
- Each test was run 5 times and averaged.
- Both the python and database were running on the same computer (Macbook 
Pro (2015), using Postgres 9.5 and Docker)

*Raw psycopg2:*
# Rough code:
extras.execute_values(cursor, 'INSERT INTO table_a (field1, field2) VALUES 
%s', [...])
extras.execute_batch(cursor, 'INSERT INTO table_a (field1, field2) VALUES 
(%s, %s)', [...])
cursor.executemany('INSERT INTO table_a (field1, field2) VALUES (%s, %s)', 
[...])

# Inserting 1000 rows
execute_values, elapsed time: 0.023967s
execute_batch, elapsed time: 0.051530s
executemany, elapsed time: 0.173563s


# Inserting 1 rows
execute_values, elapsed time: 0.268656s
execute_batch, elapsed time: 0.488736s
executemany, elapsed time: 2.017565s


# Inserting 10 rows
execute_values, elapsed time: 1.858675s
execute_batch, elapsed time: 4.062823s
executemany, elapsed time: 19.900875s


*SQLAlchemy layer:*
for field1, field2 in genome_infos(rows):
db.session.add(TableA(field1, field2))

# Time this part:
db.session.flush()

I used the code you provided above (instead of the flag you recently 
pushed), which allowed me to test both execute_batch and execute_values. 
Here was what that I used:
@event.listens_for(Engine, "do_executemany")
def do_executemany(cursor, statement, parameters, context):
context.dialect.supports_sane_multi_rowcount = False

# Option: executemany
cursor.executemany(statement, parameters)

# Option: execute_batch
extras.execute_batch(cursor, statement, parameters)

# Option: execute_batch
statement = re.sub('VALUES.*', 'VALUES %s', statement)
parameters = [(info['field1'], info['field2']) for info in parameters]
extras.execute_values(cursor, statement, parameters)

return True

Obviously only one option was used at a time :)

*SQLAlchemy Results:*
Inserting 1000 rows:
execute_values: 0.083958s
execute_batch: 0.110223s
executemany: 0.276129s

Inserting 1 rows:
execute_values: 1.243230s
execute_batch: 1.388278s
executemany: 3.131808s

Inserting 10 rows:
execute_values: 13.909975s
execute_batch: 14.942507s
executemany: 29.671092s


*Conclusions:*

   - execute_batch is a significant improvement over executemany (10x at 
   the pyscopg2 layer)
   - Subtracting the 11-12 seconds of SQLAlchemy/Python overhead for 
   inserting 100,000 rows gives roughly the psycopg2 times for each execute_ 
   option.
   - The 5000% improvements stated 
   in https://github.com/psycopg/psycopg2/issues/491#issuecomment-276551038 
   are probably exaggerated for most users - that was for "transatlantic 
   network connections", whereas I imagine most users have databases much 
   closer to their servers. However, it's still a 10x speed up over 
   executemany at the pscyopg2 layer, and a ~2-3x speed up including 
   SQLAlchemy overhead.
   - execute_values is still twice as fast as execute_batch at the psycopg2 
   layer (for this specific table*), so incorporating that would be even 
   better!
   - execute_batch only helps with inserts to tables where all primary keys 
   are defined (as you noted). Thus, if users want to see improvements for 
   tables with auto-incrementing primary keys and relationships, 
   they'll likely need to combine this with something like the suggestion 
   in https://groups.google.com/forum/#!topic/sqlalchemy/GyAZTThJi2I 

Hopefully that was helpful :) Happy to help test in any other ways as well!

On Wednesday, October 11, 2017 at 7:25:06 AM UTC-7, Mike Bayer wrote:
>
> On Wed, Oct 11, 2017 at 3:02 AM,  > 
> wrote: 
> > Hey Mike, 
> > 
> > Thanks again for the detailed explanations! 
> > 
> > I went ahead and tested the code snippet you gave me - I'm not sure I 
> > totally understand when this will change behavior though. 
> > 
> > I tried the following code snippet: 
> > (Snippet #1) 
> > for i in xrange(10): 
> > db.session.add(A(id=i)) 
> > db.session.flush() 
> > 
> > This calls through to the "do_executemany" handler and only executes 1 
> > insert statement with multiple VALUES. However, this was already the 
> > existing behavior right? 
>
> There's multiple forms of "one insert statement". 
>
> There is traditional "executemany", which to the database looks like: 
>
> INSERT INTO table (x, y) VALUES (:x, :y) 
>
> The above statement is invoked for each set of parameters. 
>
> then there is multiple values, which to the database looks like: 
>
> INSERT INTO table (x, y) VALUES (:x1, :y1) (:x2, :y2), (:x3, :y3), ... 
>
> The above statement is invoked once, with all sets of parameters at 
> once (in reality they are usually batched in groups and the statement 
> is invoked once per group).  MySQL drivers do this as does psycopg2 
> execute_values(). 
>
> then there is psycopg2 execute_batch which yes, I fo

Re: [sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-11 Thread vineet
Hey Mike,

Thanks again for the detailed explanations!

I went ahead and tested the code snippet you gave me - I'm not sure I 
totally understand when this will change behavior though.

I tried the following code snippet:
(Snippet #1)
for i in xrange(10):
db.session.add(A(id=i))
db.session.flush()

This calls through to the "do_executemany" handler and only executes 1 
insert statement with multiple VALUES. However, this was already the 
existing behavior right?

When I don't specify a primary key:
(Snippet #2)
for i in xrange(10):
db.session.add(A()) # don't pre-specify a primary key
db.session.flush()

This does *not* call through to the do_executemany handler, and executes 10 
insert statements.

If I 
understand http://initd.org/psycopg/docs/extras.html#fast-execution-helpers 
correctly, execute_batch does *not* actually combine the insert statements 
into a single statement with multiple VALUES clauses, but instead does 
something like sends multiple INSERT statements to the server at once. We 
would obviously still expect to see performance benefits from the reduced 
network round-trips, but it looks like the above tests do not show any 
improvement there.

Let me know if I'm testing that incorrectly - I'm also happy to test this 
on our actual production use cases to give you some more accurate speed-up 
times.

On Tuesday, October 10, 2017 at 10:36:33 AM UTC-7, Mike Bayer wrote:
>
> I've added 
> https://bitbucket.org/zzzeek/sqlalchemy/issues/4109/support-psycopg2-batch-mode
>  
> which will have a code review up soon. 
>
> On Tue, Oct 10, 2017 at 1:06 PM, Mike Bayer  > wrote: 
> > On Tue, Oct 10, 2017 at 1:47 AM,  > 
> wrote: 
> >> Hello! SQLAlchemy has the (really useful) behavior of batching together 
> >> inserts when all the primary keys are defined. It looks like it groups 
> >> similar tables together (while maintaining insert order ) to minimize 
> the 
> >> number of Insert statements sent to the database. 
> >> 
> >> I'm unsure what the expected behavior here is for single-table 
> inheritance 
> >> tables. Here's some example models: 
> >> class Base(db.Model): 
> >> __tablename__ = 'base' 
> >> id = db.Column(db.Integer, primary_key=True) 
> >> base_type = db.Column(db.String) 
> >> 
> >> __mapper_args__ = { 
> >> 'polymorphic_on': base_type, 
> >> } 
> >> 
> >> def __init__(self, id_): 
> >> self.id = id_ 
> >> 
> >> 
> >> class SubBase1(Base): 
> >> __mapper_args__ = { 
> >> 'polymorphic_identity': '1', 
> >> } 
> >> 
> >> col = db.Column(db.String) 
> >> 
> >> 
> >> class SubBase2(Base): 
> >> __mapper_args__ = { 
> >> 'polymorphic_identity': '2', 
> >> } 
> >> 
> >> col2 = db.Column(db.String) 
> >> 
> >> 
> >> class OtherModel(db.Model): 
> >> id = db.Column(db.Integer, primary_key=True) 
> >> 
> >> def __init__(self, id_): 
> >> self.id = id_ 
> >> 
> >> 
> >> Base, SubBase, and SubBase2 form a tree, while OtherModel is just 
> another 
> >> model. 
> >> 
> >> When creating alternating SubBase1s and OtherModels, SQLAlchemy batches 
> the 
> >> INSERTs such that only two INSERT statements are sent to the DB: 
> >> (Snippet #1) 
> >> for i in xrange(0, 10, 2): 
> >> db.session.add(SubBase1(i)) 
> >> db.session.add(OtherModel(i + 1)) 
> >> 
> >> db.session.flush()  # Performs 2 insert statements, each with 5 
> elements in 
> >> the VALUES clause 
> >> 
> >> However, when creating alternating SubBase1s and SubBase2s, it actually 
> >> performs 10 separate insert statements: 
> >> (Snippet #2) 
> >> for i in xrange(0, 10, 2): 
> >> db.session.add(SubBase1(i)) 
> >> db.session.add(SubBase2(i + 1)) 
> >> db.session.flush()  # Performs 10 insert statements, each with 1 
> element in 
> >> the VALUES clause 
> >> 
> >> 
> >> It seems like SQLAlchemy maintains the insert_order *across* SubBase1 
> and 
> >> SubBase2, but also isn't able to do a single INSERT statement with 10 
> >> elements in the VALUES clause (or 2 INSERT statements with 5 elements 
> each). 
> >> 
> >> Questions: 
> >> 1) Is Snippet #2 the desired behavior? I read through the unit-of-work 
> >> summary at http://www.aosabook.org/en/sqlalchemy.html, but I wasn't 
> totally 
> >> sure what the desired behavior was for single-table inheritance models. 
> > 
> > yes because you'll note because sub1 and sub2 have different columns 
> > mapped, the actual columns in each INSERT statement are different. 
> > At the same time it is maintaining INSERT order into the table 
> > overall, so it has no choice but to break things up in this way. 
> > 
> >> 2) If this is desired behavior, do you have any suggestions of how to 
> >> optimize the number of queries here in e.g. a before_flush hook? 
> Obviously 
> >> one way would be to rewrite our application such that inserts of like 
> tables 
> >> happen together, but if there is a way to do this without rewriting the 
> >> application code, that would be even bette

[sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-09 Thread vineet
Hello! SQLAlchemy has the (really useful) behavior of batching together 
inserts when all the primary keys are defined. It looks like it groups 
similar tables together (while maintaining insert order ) to minimize the 
number of Insert statements sent to the database.

I'm unsure what the expected behavior here is for single-table inheritance 
tables. Here's some example models:
class Base(db.Model):
__tablename__ = 'base'
id = db.Column(db.Integer, primary_key=True)
base_type = db.Column(db.String)

__mapper_args__ = {
'polymorphic_on': base_type,
}

def __init__(self, id_):
self.id = id_


class SubBase1(Base):
__mapper_args__ = {
'polymorphic_identity': '1',
}

col = db.Column(db.String)


class SubBase2(Base):
__mapper_args__ = {
'polymorphic_identity': '2',
}

col2 = db.Column(db.String)


class OtherModel(db.Model):
id = db.Column(db.Integer, primary_key=True)

def __init__(self, id_):
self.id = id_


Base, SubBase, and SubBase2 form a tree, while OtherModel is just another 
model.

When creating alternating SubBase1s and OtherModels, SQLAlchemy batches the 
INSERTs such that only two INSERT statements are sent to the DB:
(Snippet #1)
for i in xrange(0, 10, 2):
db.session.add(SubBase1(i))
db.session.add(OtherModel(i + 1))

db.session.flush()  # Performs 2 insert statements, each with 5 elements in 
the VALUES clause

However, when creating alternating SubBase1s and SubBase2s, it actually 
performs 10 separate insert statements:
(Snippet #2)
for i in xrange(0, 10, 2):
db.session.add(SubBase1(i))
db.session.add(SubBase2(i + 1))
db.session.flush()  # Performs 10 insert statements, each with 1 element in 
the VALUES clause


It seems like SQLAlchemy maintains the insert_order *across* SubBase1 and 
SubBase2, but also isn't able to do a single INSERT statement with 10 
elements in the VALUES clause (or 2 INSERT statements with 5 elements each).

Questions:
1) Is Snippet #2 the desired behavior? I read through the unit-of-work 
summary at http://www.aosabook.org/en/sqlalchemy.html, but I wasn't totally 
sure what the desired behavior was for single-table inheritance models.
2) If this is desired behavior, do you have any suggestions of how to 
optimize the number of queries here in e.g. a before_flush hook? Obviously 
one way would be to rewrite our application such that inserts of like 
tables happen together, but if there is a way to do this without rewriting 
the application code, that would be even better. One option I've thought of 
is:
  - Overwrite the insert_order in the before_flush hook to rearrange the 
models in the order I want. (Not sure if this has effects that I'm unaware 
of though)

Thanks!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Batching INSERT statements

2017-10-09 Thread vineet

>
> if you're using Postgresql, there's a vastly easier technique to use 
> which is just to pre-fetch from the sequence: 
> identities = [ 
> val for val, in session.execute( 
>  "select nextval('mytable_seq') from " 
>  "generate_series(1,%s)" % len(my_objects)) 
> ) 
> ] 
> for ident, obj in zip(identities, my_objects): 
> obj.pk = ident 

Wow, that's a great idea! I got it working for most of our models. I have 
some questions about how inserts for joined-table inheritance tables are 
batched together, but I'll ask them in a separate post since they're 
somewhat unrelated to this.

So the complexity of adding multi-values insert with sequences would 
> benefit an extremely narrow set of use cases, would be very labor 
> intensive to implement and maintain, and is unnecessary for the single 
> target platform in which this case would work. 
>
That makes sense, thanks for the explanation!

 

On Monday, October 9, 2017 at 8:44:51 AM UTC-7, Mike Bayer wrote:
>
> On Mon, Oct 9, 2017 at 4:15 AM,  > 
> wrote: 
> > Hello! I've spent some time looking at SQLAlchemy's ability to batch 
> > inserts, and have a few questions about bulk_save_objects (and flushing 
> in 
> > general). 
> > 
> > Two statements that I think are true: 
> > 
> > Right now, bulk_save_objects does not fetch primary keys for inserted 
> rows 
> > (unless return_defaults is passed in, which mitigates the performance 
> gains 
> > by performing one insert statement per model). 
>
> the point of bulk save is that the objects passed to it are considered 
> as throwaway after the operation, to avoid the overhead of maintaining 
> the objects' persistence state in memory.   Using it with 
> return_defaults is nearly pointless as this will blow away the 
> efficiency gains you might get from the bulk operation, and is there 
> mostly to allow the joined-table inheritance use case to be usable. 
>
>
> > When running db.session.flush(), it looks like SQLAlchemy can batch 
> model 
> > creation into a single multi-value insert statement *only if all of the 
> > primary keys are already defined on the model*. (Verified by looking at 
> > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) 
> > 
> > 
> > Questions: 
> > 
> > Would it be possible for bulk_save_objects to fetch primary keys while 
> still 
> > batching INSERTs into a single query? 
>
> What do you mean "single query", do you mean, executemany()?  the 
> answer is no because DBAPI drivers don't return result sets with 
> executemany().  Do you mean, a single INSERT..VALUES with all the 
> parameters in one statement ?  The answer is no because SQLAlchemy 
> doesn't include multi-values as a transparent option; the DBAPI 
> drivers instead make use of the multi-values syntax within their 
> executemany() implementations, where again, they don't return result 
> sets. 
>
> Right now, only the MySQL drivers do this by default, the psycopg2 
> driver does it with a recently added option that SQLAlchemy does not 
> directly support, however you can set up via a connection event.   For 
> psycopg2 it can vastly speed up inserts as psycopg2's normal 
> executemany() implementation has some performance issues. 
>
>
> (This 
> > would help with e.g. inserting a bunch of rows into a table with an 
> > auto-incrementing primary key). 
>
> if you're using Postgresql, there's a vastly easier technique to use 
> which is just to pre-fetch from the sequence: 
>
> identities = [ 
> val for val, in session.execute( 
>  "select nextval('mytable_seq') from " 
>  "generate_series(1,%s)" % len(my_objects)) 
> ) 
> ] 
> for ident, obj in zip(identities, my_objects): 
> obj.pk = ident 
>
>
> Now you don't need to do RETURNING or anything and the inserts can be 
> at their top efficiency. 
>
>
>
> > 
> > At least in Postgres (haven't verified for other databases), it looks 
> like 
> > one can use RETURNING for inserts with multi-value VALUES clauses. At 
> the 
> > surface, it seems like one could extend the current behavior for a 
> single 
> > row INSERT: 
> > 
> > INSERT INTO table ... VALUES (1) RETURNING id 
> > 
> > to multi-value INSERTS: 
> > 
> > INSERT INTO table ... VALUES (1), (2), ... RETURNING id 
> > 
> > and get all the benefits of the ORM while still batching inserts. 
>
>
> This would be an enormous undertaking to implement, test, and release. 
>   It would have to exist as an entirely additional series of codepaths 
> within persistence.py as multi-valued INSERT is not available on most 
> databases as well as on earlier versions of the databases that do 
> support it. For all of that complexity, the approach would work on 
> exactly:  Postgresql only - SQLite and MySQL don't support RETURNING, 
> Oracle, SQL Server and others don't support INSERT..VALUES with 
> multiple sets. 
>
> If you are using Postgresql, you can instead pre-fetch the sequence up 
> front.  This can even be integrated into a

[sqlalchemy] Batching INSERT statements

2017-10-09 Thread vineet
Hello! I've spent some time looking at SQLAlchemy's ability to batch 
inserts, and have a few questions about bulk_save_objects (and flushing in 
general).

Two statements that I think are true:

   1. Right now, bulk_save_objects does not fetch primary keys for inserted 
   rows (unless return_defaults is passed in, which mitigates the performance 
   gains by performing one insert statement per model).
   2. When running db.session.flush(), it looks like SQLAlchemy can batch 
   model creation into a single multi-value insert statement *only if all of 
   the primary keys are already defined on the model*. (Verified by looking at 
   sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements)


Questions:

   1. Would it be possible for bulk_save_objects to fetch primary keys 
   while still batching INSERTs into a single query?
   2. Or even better, would it be possible to batch INSERT statements 
   during a flush even when models don't have their primary keys pre-defined? 
   (This would help with e.g. inserting a bunch of rows into a table with an 
   auto-incrementing primary key).

At least in Postgres (haven't verified for other databases), it looks like 
one can use RETURNING for inserts with multi-value VALUES clauses. At the 
surface, it seems like one could extend the current behavior for a single 
row INSERT:

INSERT INTO table ... VALUES (1) RETURNING id

to multi-value INSERTS:

INSERT INTO table ... VALUES (1), (2), ... RETURNING id

and get all the benefits of the ORM while still batching inserts.

I'm sure this is something that was thought about already, so I'd mostly 
love to hear any thoughts about what makes this hard. We have a lot of 
complex relationships and joined-table inheritance, so working with 
bulk_save_objects has been a bit challenging. From the comments in 
sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like 
SQLAlchemy is already aware that RETURNING for insert statements with 
multi-values VALUES clauses is supported, so it's possible there is a 
reason that this is hard or can't be done.

This is also something that, if it just a “missing feature”, I would love 
to learn more and see if I could contribute.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Re: A "after flush before commit" event

2017-09-19 Thread vineet
Hey Mike,

Thanks for taking the time to answer all of these questions! I think what 
we're looking for is a way to perform some actions immediately before 
"COMMIT" is sent to the database. The current solutions (as far as I can 
understand them) all have some drawbacks:

*Perform actions in before_commit*
This event fires before a flush event, so the following code performs the 
actions *before* the data has been flushed:

a = A()
a.name = 'newname'
db.session.commit()


*Mark a boolean flag in before_commit, and do actions in after_flush if the 
boolean is set*
This fixes the above problem, but runs into a new issue:

a = A()
a.name = 'newname'
db.session.flush()  # Boolean isn't set, so actions don't fire
db.session.commit()  # There is no dirty state, so after_flush never gets 
called here and actions don't fire


It looks like what we're looking for is a "immediately_before_commit_query" 
hook. Does this exist in SQLAlchemy right now? If not, another alternative 
we could go with is:

*Perform actions in before_commit, but call db.session.flush() first* 
(suggested by Jonathan Vanasco earlier in this thread)
We manually call db.session.flush() to ensure that we don't have any 
remaining "dirty" state, and in essence we can now perform actions 
"immediately before the COMMIT query". It would look something like:

@event.listens_for(SignallingSession, 'before_commit')
def actions(...):
db.session.flush()
# Do other actions


I'm not immediately confident that this is equivalent to the conceptual 
"immediately_before_commit_query" hook - mostly not sure if there are any 
gotchas with this approach. Do you think this is a safe thing to do?

On Tuesday, September 19, 2017 at 1:46:23 PM UTC-7, Mike Bayer wrote:
>
> On Tue, Sep 19, 2017 at 4:06 PM,  > 
> wrote: 
> > Just checking, but one scenario where "after flush before commit" would 
> not 
> > work is when all changes are flushed immediately before a commit, right? 
> > In that case, "commit" would not automatically be preceded by a "flush" 
> (and 
> > the "after_flush_postexec" code would not run because no token from 
> > "before_commit" has been set). 
>
> if there is nothing to flush, like there's no objects or no attributes 
> have been modified, then when commit() is called, it is usually the 
> case that no flush occurs and you would not see these events. 
>
>
> > 
> > e.g., 
> > # make some changes 
> > session.flush() 
> > session.commit()  # Does not kick off its own flush 
> > 
> > On Tuesday, September 19, 2017 at 6:27:12 AM UTC-7, Mike Bayer wrote: 
> >> 
> >> On Tue, Sep 19, 2017 at 12:19 AM,   wrote: 
> >> > Followup question -- this the following order of events guaranteed? 
> >> > 
> >> > before_commit --> after_flush --> after_flush_post_exec 
> >> > 
> >> > i.e., is an after_flush_post_exec guaranteed to run after the 
> >> > before_commit 
> >> > preceding a commit? 
> >> 
> >> if the flush didn't raise then yes. 
> >> 
> >> 
> >> > 
> >> > -- 
> >> > SQLAlchemy - 
> >> > The Python SQL Toolkit and Object Relational Mapper 
> >> > 
> >> > http://www.sqlalchemy.org/ 
> >> > 
> >> > To post example code, please provide an MCVE: Minimal, Complete, and 
> >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a 
> full 
> >> > description. 
> >> > --- 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups 
> >> > "sqlalchemy" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >> > an 
> >> > email to sqlalchemy+...@googlegroups.com. 
> >> > To post to this group, send email to sqlal...@googlegroups.com. 
> >> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> >> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to sqlalchemy+...@googlegroups.com . 
> > To post to this group, send email to sqlal...@googlegroups.com 
> . 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegr

Re: [sqlalchemy] Potential bug with .exists() queries and subqueryloads

2017-07-18 Thread vineet

>
> however, I can't think of any reason public use of .subquery() or 
> .exists() would ever *want* eagerloaded options to take place since by 
> definition .subquery() and .exists() are never used to load objects. 
> So in 1.2 I'll propose to make this automatic
>

Yep, I agree with this! Thanks, I'll use .enable_eagerloads(False) to fix 
it for now.

On Tuesday, July 18, 2017 at 11:56:08 AM UTC-7, Mike Bayer wrote:
>
> On Tue, Jul 18, 2017 at 2:13 PM,  > 
> wrote: 
> > Hello, I'm running into some difference in behavior for .exists() in 
> > SQLAlchemy 1.0.17 and 1.1.11, related to subqueryloads. 
> > 
> > 
> > class A(db.Model): 
> > __tablename__ = 'a' 
> > 
> > id = db.Column(db.Integer, primary_key=True) 
> > b_id = db.Column(db.Integer, db.ForeignKey('b.id'), nullable=False) 
> > b = db.relationship('B', lazy='subquery') 
> > 
> > 
> > class B(db.Model): 
> > __tablename__ = 'b' 
> > 
> > id = db.Column(db.Integer, primary_key=True) 
> > 
> > 
> > In SQLAlchemy 1.0.17, I can do all of the following: 
> > 
> > A.query.exists()  # Note that A.b has lazy='subquery' 
> > A.query.options(subqueryload(A.b)).exists() 
> > 
> > # We usually use this in the context of: 
> > db.session.query(A.query.exists()).scalar() 
>
> There is a bug here in add_column() / subqueryload added as 
>
> https://bitbucket.org/zzzeek/sqlalchemy/issues/4033/dont-crash-for-add_columns-when.
>  
>
>
> however if you go into your 1.0 version and instead do lazy="joined" 
> you'll see your EXISTS query rendering out the LEFT OUTER JOIN 
> entirely unnecessarily. 
>
> Current API when you're turning the query into a subquery or exists is 
> to call enable_eagerloads(False): 
>
> print s.query(A).enable_eagerloads(False).exists() 
>
>
> the method is documented as such: 
>
>
> https://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=enable_eagerloads#sqlalchemy.orm.query.Query.enable_eagerloads
>  
>
> however, I can't think of any reason public use of .subquery() or 
> .exists() would ever *want* eagerloaded options to take place since by 
> definition .subquery() and .exists() are never used to load objects. 
> So in 1.2 I'll propose to make this automatic, that's 
>
> https://bitbucket.org/zzzeek/sqlalchemy/issues/4032/enable_eagerloads-false-automatically-for.
>  
>
>
>
>
>
>
>
> > 
> > In SQLALchemy 1.1.11, we run into the following error: 
> > A.query.exists()  # Note that A.b has lazy='subquery' 
> > 
> > .../lib/python2.7/site-packages/sqlalchemy/orm/query.pyc in exists(self) 
> >3046 # .with_only_columns() after we have a core select() so 
> that 
> >3047 # we get just "SELECT 1" without any entities. 
> > -> 3048 return sql.exists(self.add_columns('1').with_labels(). 
> >3049   statement.with_only_columns([1])) 
> >3050 
> > 
> > .../lib/python2.7/site-packages/sqlalchemy/orm/query.pyc in 
> statement(self) 
> > 465 """ 
> > 466 
> > --> 467 stmt = self._compile_context(labels=self._with_labels).\ 
> > 468 statement 
> > 469 if self._params: 
> > 
> > .../lib/python2.7/site-packages/sqlalchemy/orm/query.pyc in 
> > _compile_context(self, labels) 
> >3363 
> >3364 for entity in self._entities: 
> > -> 3365 entity.setup_context(self, context) 
> >3366 
> >3367 for rec in context.create_eager_joins: 
> > 
> > .../lib/python2.7/site-packages/sqlalchemy/orm/query.pyc in 
> > setup_context(self, query, context) 
> >3725 with_polymorphic=self._with_polymorphic, 
> >3726 only_load_props=query._only_load_props, 
> > -> 3727 
> > polymorphic_discriminator=self._polymorphic_discriminator) 
> >3728 
> >3729 def __str__(self): 
> > 
> > .../lib/python2.7/site-packages/sqlalchemy/orm/loading.pyc in 
> > _setup_entity_query(context, mapper, query_entity, path, adapter, 
> > column_collection, with_polymorphic, only_load_props, 
> > polymorphic_discriminator, **kw) 
> > 257 column_collection=column_collection, 
> > 258 memoized_populators=quick_populators, 
> > --> 259 **kw 
> > 260 ) 
> > 261 
> > 
> > .../lib/python2.7/site-packages/sqlalchemy/orm/interfaces.pyc in 
> setup(self, 
> > context, entity, path, adapter, **kwargs) 
> > 515 else: 
> > 516 strat = self.strategy 
> > --> 517 strat.setup_query(context, entity, path, loader, 
> adapter, 
> > **kwargs) 
> > 518 
> > 519 def create_row_processor( 
> > 
> > .../lib/python2.7/site-packages/sqlalchemy/orm/strategies.pyc in 
> > setup_query(self, context, entity, path, loadopt, adapter, 
> > column_collection, parentmapper, **kwargs) 
> > 786 orig_query, leftmost_mapper, 
> > 787 leftmost_attr, leftmost_relationship, 
> > --> 788 entity.entity_zero 
> > 789 ) 
> > 790 
> > 
> > .../lib/pytho

[sqlalchemy] Potential bug with .exists() queries and subqueryloads

2017-07-18 Thread vineet
Hello, I'm running into some difference in behavior for .exists() in 
SQLAlchemy 1.0.17 and 1.1.11, related to subqueryloads.


class A(db.Model):
__tablename__ = 'a'

id = db.Column(db.Integer, primary_key=True)
b_id = db.Column(db.Integer, db.ForeignKey('b.id'), nullable=False)
b = db.relationship('B', lazy='subquery')


class B(db.Model):
__tablename__ = 'b'

id = db.Column(db.Integer, primary_key=True)


In SQLAlchemy 1.0.17, I can do all of the following:

A.query.exists()  # Note that A.b has lazy='subquery'
A.query.options(subqueryload(A.b)).exists()

# We usually use this in the context of:
db.session.query(A.query.exists()).scalar()

In SQLALchemy 1.1.11, we run into the following error:
A.query.exists()  # Note that A.b has lazy='subquery'

.../lib/python2.7/site-packages/sqlalchemy/orm/query.pyc in exists(self)
   3046 # .with_only_columns() after we have a core select() so that
   3047 # we get just "SELECT 1" without any entities.
-> 3048 return sql.exists(self.add_columns('1').with_labels().
   3049   statement.with_only_columns([1]))
   3050

.../lib/python2.7/site-packages/sqlalchemy/orm/query.pyc in statement(self)
465 """
466
--> 467 stmt = self._compile_context(labels=self._with_labels).\
468 statement
469 if self._params:

.../lib/python2.7/site-packages/sqlalchemy/orm/query.pyc in 
_compile_context(self, labels)
   3363
   3364 for entity in self._entities:
-> 3365 entity.setup_context(self, context)
   3366
   3367 for rec in context.create_eager_joins:

.../lib/python2.7/site-packages/sqlalchemy/orm/query.pyc in 
setup_context(self, query, context)
   3725 with_polymorphic=self._with_polymorphic,
   3726 only_load_props=query._only_load_props,
-> 3727 
polymorphic_discriminator=self._polymorphic_discriminator)
   3728
   3729 def __str__(self):

.../lib/python2.7/site-packages/sqlalchemy/orm/loading.pyc in 
_setup_entity_query(context, mapper, query_entity, path, adapter, 
column_collection, with_polymorphic, only_load_props, 
polymorphic_discriminator, **kw)
257 column_collection=column_collection,
258 memoized_populators=quick_populators,
--> 259 **kw
260 )
261

.../lib/python2.7/site-packages/sqlalchemy/orm/interfaces.pyc in 
setup(self, context, entity, path, adapter, **kwargs)
515 else:
516 strat = self.strategy
--> 517 strat.setup_query(context, entity, path, loader, adapter, 
**kwargs)
518
519 def create_row_processor(

.../lib/python2.7/site-packages/sqlalchemy/orm/strategies.pyc in 
setup_query(self, context, entity, path, loadopt, adapter, 
column_collection, parentmapper, **kwargs)
786 orig_query, leftmost_mapper,
787 leftmost_attr, leftmost_relationship,
--> 788 entity.entity_zero
789 )
790

.../lib/python2.7/site-packages/sqlalchemy/orm/strategies.pyc in 
_generate_from_original_query(self, orig_query, leftmost_mapper, 
leftmost_attr, leftmost_relationship, orig_entity)
859 ent['entity'] for ent in 
orig_query.column_descriptions
860 ])),
--> 861 False
862 )
863

.../lib/python2.7/site-packages/sqlalchemy/orm/query.pyc in 
_set_select_from(self, obj, set_base_alias)
191
192 for from_obj in obj:
--> 193 info = inspect(from_obj)
194 if hasattr(info, 'mapper') and \
195 (info.is_mapper or info.is_aliased_class):

.../lib/python2.7/site-packages/sqlalchemy/inspection.pyc in 
inspect(subject, raiseerr)
 73 "No inspection system is "
 74 "available for object of type %s" %
---> 75 type_)
 76 return ret
 77

NoInspectionAvailable: No inspection system is available for object of type 



In particular, a subqueried relationship (either through lazy='subquery' or 
subqueryload(A.b)) causes .exists() to fail. We obviously don't call 
subqueryload(A.b) directly with exists queries, but we have models using 
lazy='subquery' that we use .exists() queries for.

My current workaround is A.query.options(lazyload('*')).exists(). Does this 
seem like a bug introduced in 1.1, or were there any intentional changes 
regarding use of .exists() queries?

Thanks!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post

Re: [sqlalchemy] Bug report with JSON nulls in 1.1.x

2017-07-18 Thread vineet

>
> OK, this is great, it's likely not the "underscore" as much as that 
> the attribute is named differently from the column.  I will look to 
> confirm this and set up a complete bug report, thanks!
>
 
Ah, that makes a lot more sense. Thanks for looking into that!

Above, you are looking at "a.column2", did you mean "a._column2"? 
> There should be no ".column2" attribute on the object. 
>

Woops, typo. That's what I get for trying to clean up the code to make it 
look nicer without rerunning it. 

the JSON column will translate "None" into JSON.NULL as it goes to the 
> database.   The "default" here is a python side-default, so the python 
> side value will be maintained, so ideally we'd want to set 
> default=None, but that means, "there's no default" so we're in a 
> little bit of a pickle there.  this issue was discussed in 
>
> https://bitbucket.org/zzzeek/sqlalchemy/issues/3870/client-side-column-default-value
>  
> 
> , 
> where the solution was a new bit of documentation added to the notes 
> for JSON.   the solution is at 
>
> http://docs.sqlalchemy.org/en/latest/core/type_basics.html?highlight=json#sqlalchemy.types.JSON.NULL
>  
> and describes to set the default to a SQL expression that resolves to 
> JSON NULL; the ORM will fetch this default back from the database; the 
> flush process will expire the Python value so that this fetch occurs 
> prior to the usual expiration at commit time.Hopefully you're 
> using Postgresql in which case you can set eager_defaults=True on the 
> mapping and it will re-fetch the value inline with the INSERT using 
> RETURNING. 
>

Got it, thanks. I think this will work well for us. We are using Postgres 
as well :)

JSON.NULL is this 
> totally special value that the Core default mechanism doesn't yet 
> fully accommodate so that's why setting default=text("'null'") for now 
> is a workaround. 
>

Makes sense, and a big thanks for the quick responses!
 

On Tuesday, July 18, 2017 at 8:02:56 AM UTC-7, Mike Bayer wrote:
>
> On Tue, Jul 18, 2017 at 10:31 AM, Mike Bayer  > wrote: 
> > On Tue, Jul 18, 2017 at 12:47 AM,  > 
> wrote: 
> >> Hello, I'd like to report a bug report regarding JSON nulls and 
> >> underscore-prefixed attributes as we're upgrading from 1.0.17 to 
> 1.1.11. 
> >> 
> >> I read through the behavior at 
> >> 
> http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#change-3514 
> >> and 
> >> 
> http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#behavior-change-3514,
>  
>
> >> all of which makes sense to me. 
> >> 
> >> However, this does not seem to work for attributes which are prefixed 
> with 
> >> an underscore. Example: 
> > 
> > OK, this is great, it's likely not the "underscore" as much as that 
> > the attribute is named differently from the column.  I will look to 
> > confirm this and set up a complete bug report, thanks! 
>
> set up w/ a patch at 
>
> https://bitbucket.org/zzzeek/sqlalchemy/issues/4031/eval_none-logic-isnt-resolving-col-to-prop.
>  
>
>
> > 
> >> 
> >> class A(db.Model): 
> >> __tablename__ = 'a' 
> >> 
> >> 
> >> id = db.Column(db.Integer, primary_key=True) 
> >> column1 = db.Column('column1', JSON, nullable=False) 
> >> _column2 = db.Column('column2', JSON, nullable=False) 
> >> 
> >> # Succeeds 
> >> db.session.add(A(column1=[], _column2=[])) 
> >> db.session.flush() 
> >> 
> >> # Succeeds 
> >> db.session.add(A(column1=None, _column2=[])) 
> >> db.session.flush() 
> >> 
> >> # Fails with integrity error 
> >> db.session.add(A(column1=[], _column2=None)) 
> >> db.session.flush() 
> >> 
> >> 
> >> (psycopg2.IntegrityError) null value in column "column2" violates 
> not-null 
> >> constraint 
> >> DETAIL:  Failing row contains (5, [], null). 
> >>  [SQL: '-- manage.py:1242 in \nINSERT INTO a (column1) VALUES 
> >> (%(column1)s) RETURNING a.id'] [parameters: {'column1': '[]'}] 
> >> 
> >> Using db.Column('column2', JSON, nullable=False, default=JSON.NULL) 
> ends up 
> >> sending a JSON NULL, so basically it looks like it is ignoring 
> explicitly 
> >> set None values for attributes that start with an underscore. 
> >> 
> >> This is not workflow blocking for us (I will just change our usages to 
> also 
> >> use default=JSON.NULL), but wanted to file a bug report in case it's 
> >> something others run into as well! 
> >> 
> >> -- 
> >> SQLAlchemy - 
> >> The Python SQL Toolkit and Object Relational Mapper 
> >> 
> >> http://www.sqlalchemy.org/ 
> >> 
> >> To post example code, please provide an MCVE: Minimal, Complete, and 
> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> >> description. 
> >> --- 
> >> You received this message because you are subscribed to the Google 
> Groups 
> >> "sqlalchemy" group. 
> >> To unsubscribe from this group and stop receiving 

[sqlalchemy] Re: Bug report with JSON nulls in 1.1.x

2017-07-17 Thread vineet
A related issue (that seems like a bug) happens when reading attribute 
values after a flush, but before a commit. In this scenario, I set the 
default=JSON.NULL in both columns
class A(db.Model):
__tablename__ = 'a'

id = db.Column(db.Integer, primary_key=True)
column1 = db.Column('column1', JSON, nullable=False, default=JSON.NULL)
_column2 = db.Column('column2', JSON, nullable=False, default=JSON.NULL)

a = A(column1=None, column2=None)
db.session.add(a)

# Expected
print a.column1  # None
print a._column2  # None


# Unexpected differences
db.session.flush()
print a.column1  # None
print a.column2  # symbol('JSON_NULL')

db.session.commit()
print a.column1  # None
print a.column2  # None



I believe this is expected given the bug listed above. However, I have a 
question about how to handle this even when the original bug is fixed.

With "default=JSON.NULL", it seems like the value of a.column1 can be None 
OR symbol('JSON_NULL') depending on if it is read 1) before flush, 2) after 
flush, and 3) after commit. Are there any suggested ways of dealing with 
this? (Or it's possible I'm doing something wrong here!) We have code that 
checks "if a.column1 is None", which is failing with the 1.1.11 upgrade.

On Monday, July 17, 2017 at 9:47:10 PM UTC-7, vin...@benchling.com wrote:
>
> Hello, I'd like to report a bug report regarding JSON nulls and 
> underscore-prefixed attributes as we're upgrading from 1.0.17 to 1.1.11.
>
> I read through the behavior at 
> http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#change-3514 
> and 
> http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#behavior-change-3514,
>  
> all of which makes sense to me.
>
> However, this does not seem to work for attributes which are prefixed with 
> an underscore. Example:
>
> class A(db.Model):
> __tablename__ = 'a'
>
>
> id = db.Column(db.Integer, primary_key=True)
> column1 = db.Column('column1', JSON, nullable=False)
> _column2 = db.Column('column2', JSON, nullable=False)
>
> # Succeeds
> db.session.add(A(column1=[], _column2=[]))
> db.session.flush()
>
> # Succeeds
> db.session.add(A(column1=None, _column2=[]))
> db.session.flush()
>
> # Fails with integrity error
> db.session.add(A(column1=[], _column2=None))
> db.session.flush()
>
>
> (psycopg2.IntegrityError) null value in column "column2" violates not-null 
> constraint
> DETAIL:  Failing row contains (5, [], null).
>  [SQL: '-- manage.py:1242 in \nINSERT INTO a (column1) VALUES 
> (%(column1)s) RETURNING a.id'] [parameters: {'column1': '[]'}]
>
> Using db.Column('column2', JSON, nullable=False, default=JSON.NULL) ends 
> up sending a JSON NULL, so basically it looks like it is ignoring 
> explicitly set None values for attributes that start with an underscore.
>
> This is not workflow blocking for us (I will just change our usages to 
> also use default=JSON.NULL), but wanted to file a bug report in case it's 
> something others run into as well!
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Bug report with JSON nulls in 1.1.x

2017-07-17 Thread vineet
Hello, I'd like to report a bug report regarding JSON nulls and 
underscore-prefixed attributes as we're upgrading from 1.0.17 to 1.1.11.

I read through the behavior at 
http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#change-3514 
and 
http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#behavior-change-3514,
 
all of which makes sense to me.

However, this does not seem to work for attributes which are prefixed with 
an underscore. Example:

class A(db.Model):
__tablename__ = 'a'


id = db.Column(db.Integer, primary_key=True)
column1 = db.Column('column1', JSON, nullable=False)
_column2 = db.Column('column2', JSON, nullable=False)

# Succeeds
db.session.add(A(column1=[], _column2=[]))
db.session.flush()

# Succeeds
db.session.add(A(column1=None, _column2=[]))
db.session.flush()

# Fails with integrity error
db.session.add(A(column1=[], _column2=None))
db.session.flush()


(psycopg2.IntegrityError) null value in column "column2" violates not-null 
constraint
DETAIL:  Failing row contains (5, [], null).
 [SQL: '-- manage.py:1242 in \nINSERT INTO a (column1) VALUES 
(%(column1)s) RETURNING a.id'] [parameters: {'column1': '[]'}]

Using db.Column('column2', JSON, nullable=False, default=JSON.NULL) ends up 
sending a JSON NULL, so basically it looks like it is ignoring explicitly 
set None values for attributes that start with an underscore.

This is not workflow blocking for us (I will just change our usages to also 
use default=JSON.NULL), but wanted to file a bug report in case it's 
something others run into as well!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] How to have SQL IF in sqlalchemy

2014-06-18 Thread Vineet Goel
Hi,

I am trying to convert the following SQL to SQLAlchemy:

SELECT teams.department, teams.team,
IF(employee_managers.team_id IS NOT NULL, employee_managers.manager, 
teams.default_manager) AS manager
FROM teams
LEFT JOIN employee_managers ON employee_managers.team_id = teams.id
WHERE teams.department = 'hr' and (employee_managers.name = 'vineet' OR 
employee_managers.name IS NULL)

where my models look like this:

class EmployeeTeam(db.Model):
  __tablename__ = 'teams'
  id = db.Column(db.Integer, primary_key=True)
  name = db.Column(db.String(140), nullable=False)
  department = db.Column(db.String(140), nullable=False)
  team = db.Column(db.String(140), nullable=False)
  default_manager = db.Column(db.String(140), nullable=False)
class EmployeeManagers(db.Model):
  __tablename__ = 'employee_managers'
  id = db.Column(db.Integer, primary_key=True)
  name = db.Column(db.String(140), nullable=False)
  team_id = db.Column(db.Integer, db.ForeignKey('teams.id'))
  team = db.relationship('EmployeeTeam')
  manager = db.Column(db.String(140), nullable=False)

After hours of googling, I found some func.IF kind of stuff but they all 
keep giving me an  OperationalError: (OperationalError) (1248, 'Every 
derived table must have its own alias').

Does anyone know a simple way to convert this SQL to SQLAlchemy?

Any help would be much appreciated.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.