[sqlalchemy] replicating/triggering a session connection issue for testing?

2017-08-31 Thread Jonathan Vanasco
I discovered an edge-case in one of our apps today.  A form got pummeled by 
a botnet which was going through a list of compromised account data for 
hits (e.g. credentials that match haveibeenpwned.com).  That triggered a 
lot of internal metrics logging, which eventually a dedicated logging 
postgresql server out of disk space for a bit.  

the same logging server also handles our "exception" logging, which is done 
via a separate dedicated session.  something in our stack somehow didn't 
close the connection/reset the session when it should have, and the 
session/connections stayed active.  that led to the connection pool 
constantly checking out bad sessions for several hours.  

free space was quickly recovered, and every other SqlAlchemy system in the 
same app was immediately working fine.  the only thing that broke, and 
stayed broken, was our "python exception" logger.

does anyone know if there is a way for me to somehow break/wedge a given 
engine's connection?  I'm trying to recreate this edge-case, and failing 
miserably. 


the error was `sqlalchemy.exc.InvalidRequestError ` with the message: 

InvalidRequestError: This Session's transaction has been rolled back 
due to a previous exception during flush. To begin a new transaction with 
this Session, first issue Session.rollback(). Original exception was: 
(psycopg2.OperationalError) could not extend file "base/16410/23816": No 
space left on device




-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Is Textual SQL DB neutral

2017-08-31 Thread Ken MacKenzie
On a side note the literal_column change worked fine performance wise 
allowing me to remove the literal_bind/compile step.

I guess I am 50/50 on which is cleaner.  Is either one more advantageous 
from a SQL injection perspective?
SQL Injection is not a primary concern at the moment for this process as 
what we are dealing with is a data replication service pulling from a u2 
data store to sql server.  The delete is part of batch cleanup testing.

On Thursday, August 31, 2017 at 5:13:58 PM UTC-4, Ken MacKenzie wrote:
>
> Tried doing various conversions on the pk values as they enter the 
> statement:
> 1.  to bytes
> 2.  to ascii
> 3. to latin1 (technically the same encoding as the extract source before 
> entering the db)
>
> None of which yielded a performance improvement for the non-compiled 
> version.
>
> I have read that this can be an issue with pyodbc and that there are 
> engine settings related to it.  Also perhaps I should try using the pymssql 
> driver to see if that changes anything.
>
> On Thursday, August 31, 2017 at 5:00:47 PM UTC-4, Ken MacKenzie wrote:
>>
>> So inspecting the elements of the tuple, they are both str, so hence 
>> unicode.
>>
>> Are you saying that if I convert those values to bytes it could improve 
>> performance?
>>
>>  
>>
>>> I'd not bother with the literal_binds and just use a literal value: 
>>>
>>> pkf = [(col == literal_column("'%s'" % v)) for (col, v) in zip(cols, x)] 
>>>
>>> but also I'd look to see what the nature of "v" is, if it's like a 
>>> Unicode object or something, you might be getting bogged down on the 
>>> decode/encode or something like that.   Sending as bytes() perhaps 
>>> might change that. 
>>>
>>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Is Textual SQL DB neutral

2017-08-31 Thread Ken MacKenzie
Tried doing various conversions on the pk values as they enter the 
statement:
1.  to bytes
2.  to ascii
3. to latin1 (technically the same encoding as the extract source before 
entering the db)

None of which yielded a performance improvement for the non-compiled 
version.

I have read that this can be an issue with pyodbc and that there are engine 
settings related to it.  Also perhaps I should try using the pymssql driver 
to see if that changes anything.

On Thursday, August 31, 2017 at 5:00:47 PM UTC-4, Ken MacKenzie wrote:
>
> So inspecting the elements of the tuple, they are both str, so hence 
> unicode.
>
> Are you saying that if I convert those values to bytes it could improve 
> performance?
>
>  
>
>> I'd not bother with the literal_binds and just use a literal value: 
>>
>> pkf = [(col == literal_column("'%s'" % v)) for (col, v) in zip(cols, x)] 
>>
>> but also I'd look to see what the nature of "v" is, if it's like a 
>> Unicode object or something, you might be getting bogged down on the 
>> decode/encode or something like that.   Sending as bytes() perhaps 
>> might change that. 
>>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Is Textual SQL DB neutral

2017-08-31 Thread Ken MacKenzie
So inspecting the elements of the tuple, they are both str, so hence 
unicode.

Are you saying that if I convert those values to bytes it could improve 
performance?

 

> I'd not bother with the literal_binds and just use a literal value: 
>
> pkf = [(col == literal_column("'%s'" % v)) for (col, v) in zip(cols, x)] 
>
> but also I'd look to see what the nature of "v" is, if it's like a 
> Unicode object or something, you might be getting bogged down on the 
> decode/encode or something like that.   Sending as bytes() perhaps 
> might change that. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Is Textual SQL DB neutral

2017-08-31 Thread Mike Bayer
On Thu, Aug 31, 2017 at 2:19 PM, Ken MacKenzie  wrote:
> So here is my solution as of now.  As far as I can tell hits the marks of:
>
> 1. DB Neutral
> 2. Performance (9k records deleted in 5 seconds)
> 3. Generic
>
> (excuse the print statements still there for testing)
>
> Would welcome comments on this proposed solution.  I have learned a lot from
> the advice in this thread and can always stand to learn more for the sake of
> my code.
>
> @classmethod
> def dict_delete(cls, dict_in):
> """
> Delete records from the database
> based on a dictionary keyed by
> PK tuple
> """
> s = dbc.get_session()
>
> d_start = datetime.datetime.now()
> keys_in = [k for k in dict_in.keys()]
> batch_size = 1000
> cols = [getattr(cls, colname) for colname in cls.SQL_PK]
> while len(keys_in):
> id_batch = keys_in[:batch_size]
> del keys_in[:batch_size]
>
> or_cond = []
> for x in id_batch:
>
> pkf = [(col == v) for (col, v) in zip(cols, x)]
> and_cond = and_(*pkf)
> or_cond.append(and_cond)
>
> stmt = cls.__table__.delete(or_(*or_cond))
> stmt_comp = str(stmt.compile(dialect=s.bind.dialect,
> compile_kwargs={"literal_binds": True}))

I'd not bother with the literal_binds and just use a literal value:

pkf = [(col == literal_column("'%s'" % v)) for (col, v) in zip(cols, x)]

but also I'd look to see what the nature of "v" is, if it's like a
Unicode object or something, you might be getting bogged down on the
decode/encode or something like that.   Sending as bytes() perhaps
might change that.


>
> print(stmt_comp)
> print(len(keys_in), "-", datetime.datetime.now() - d_start)
>
> s.execute(stmt_comp)
>
> s.commit()
>
> s.close()
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Is Textual SQL DB neutral

2017-08-31 Thread Ken MacKenzie
So here is my solution as of now.  As far as I can tell hits the marks of:

1. DB Neutral
2. Performance (9k records deleted in 5 seconds)
3. Generic

(excuse the print statements still there for testing)

Would welcome comments on this proposed solution.  I have learned a lot 
from the advice in this thread and can always stand to learn more for the 
sake of my code.

@classmethod
def dict_delete(cls, dict_in):
"""
Delete records from the database
based on a dictionary keyed by
PK tuple
"""
s = dbc.get_session()

d_start = datetime.datetime.now()
keys_in = [k for k in dict_in.keys()]
batch_size = 1000
cols = [getattr(cls, colname) for colname in cls.SQL_PK]
while len(keys_in):
id_batch = keys_in[:batch_size]
del keys_in[:batch_size]

or_cond = []
for x in id_batch:

pkf = [(col == v) for (col, v) in zip(cols, x)]
and_cond = and_(*pkf)
or_cond.append(and_cond)

stmt = cls.__table__.delete(or_(*or_cond))
stmt_comp = str(stmt.compile(dialect=s.bind.dialect, 
compile_kwargs={"literal_binds": True}))

print(stmt_comp)
print(len(keys_in), "-", datetime.datetime.now() - d_start)

s.execute(stmt_comp)

s.commit()

s.close()


-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Is Textual SQL DB neutral

2017-08-31 Thread Ken MacKenzie
Ending output from batch size of 10:

501 - 0:24:35.558103
491 - 0:24:37.197976
481 - 0:24:38.909318
471 - 0:24:40.645801
461 - 0:24:41.954807
451 - 0:24:43.628955
441 - 0:24:45.246469
431 - 0:24:46.662228
421 - 0:24:48.192343
411 - 0:24:49.839395
401 - 0:24:51.513340
391 - 0:24:53.153668
381 - 0:24:54.836501
371 - 0:24:56.372331
361 - 0:24:58.034510
351 - 0:24:59.714655
341 - 0:25:01.401695
331 - 0:25:03.026082
321 - 0:25:04.657724
311 - 0:25:06.324783
301 - 0:25:07.946133
291 - 0:25:09.347907
281 - 0:25:11.000936
271 - 0:25:12.640046
261 - 0:25:14.281258
251 - 0:25:15.939942
241 - 0:25:17.651838
231 - 0:25:19.307522
221 - 0:25:20.931354
211 - 0:25:22.589152
201 - 0:25:24.223949
191 - 0:25:25.868513
181 - 0:25:27.540003
171 - 0:25:29.198259
161 - 0:25:30.492486
151 - 0:25:31.778539
141 - 0:25:33.292928
131 - 0:25:34.966223
121 - 0:25:36.623493
111 - 0:25:38.276205
101 - 0:25:39.996527
91 - 0:25:41.374338
81 - 0:25:43.011192
71 - 0:25:44.639656
61 - 0:25:46.293672
51 - 0:25:47.954129
41 - 0:25:49.663463
31 - 0:25:51.333620
21 - 0:25:52.842479
11 - 0:25:54.489094
1 - 0:25:56.109371
0 - 0:25:56.431443


Basically the only conclusion I can draw is that it is a case of the bind 
params.

So I know this is not recommended but to retain DB neutrality can I use the 
output from query.statement.compile(dialect=s.bind.dialect, 
compile_kwargs={"literal_binds": True}) and feed that output back in direct 
to execute?


On Thursday, August 31, 2017 at 6:41:17 AM UTC-4, Ken M wrote:
>
> I tried a batch size as small as ten and each step took about 2 seconds. I 
> set it up to show me the time between each batch for that run.  Lowering 
> the batch size actually seems to worsen performance. Which is not what I 
> expected. 
>
> Sent from my iPad 
>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Updating pg8000 dialect following new release of pg8000

2017-08-31 Thread Mike Bayer
On Thu, Aug 31, 2017 at 7:02 AM, Tony Locke  wrote:
> There's a new release (1.11.0) of the pg8000 driver for PostgreSQL. It's a
> pure-python driver, and it already has a dialect for SQLAlchemy. This latest
> release is not backwardly compatible with the previous release, and I'm
> trying to modify the dialect accordingly. The main change is that
> connections and cursors are no longer threadsafe. In DB-API terms it has a
> threadsafety value of 1 (Threads may share the module, but not connections).
>
> So the first problem I ran into was in the on_connect() method in the
> dialiect. It referred to the 'unicode' keyword, which caused a problem under
> Python 3. So I deleted the following:
>
>  def on_connect(conn):
>  conn.py_types[quoted_name] = conn.py_types[unicode]
>  fns.append(on_connect)
>

that was a recent fix for a regression:

https://github.com/zzzeek/sqlalchemy/commit/03560c4b83308719067ec635662c35f9a437fb7f

it is fixed in

https://github.com/zzzeek/sqlalchemy/commit/d0470e296ea589620c94d8f2dd37e94b8f03842a


> just to get the tests going. The next problem is:
>
> test/aaa_profiling/test_memusage.py::MemUsageWBackendTest_postgresql+pg8000_9_5_8::()::test_alias_pathing
>
> which I think fails due to a threading problem. It seems that somehow the
> DB-API connection is shared between threads, which isn't supported any more
> with pg8000. So my question is, is it the case that:
>
> a. DB-API connections must be threadsafe to work with SQLAlchemy.

not at all, SQLAlchemy holds that its own Session and Connection which
ultimately refer to the DBAPI connection are themselves not
threadsafe.

> b. There's a something wrong with the test.

the memusage tests are often omitted from normal testing as they are
extremely process/memory intensive, but they don't spawn any threads.
  In the past few weeks the memusage suite has been altered such that
each memusage test is run in a separate *process* however, so there is
some concurrency going on.   When the new process is created, the test
makes a new connection pool so that it should not refer to any
database connections that were transferred to the child fork, however
it also doesn't try to close them or anything else - they should be
totally ignored.   However if pg8000 is tracking some kind of global
state, like a collection of prepared statements, this state needs to
travel across the process boundary as well without impacting the
parent process even if the child process ends.

The failure can be isolated by doing a pdb like this:

diff --git a/test/aaa_profiling/test_memusage.py
b/test/aaa_profiling/test_memusage.py
index 3181cfe61..ff600b85d 100644
--- a/test/aaa_profiling/test_memusage.py
+++ b/test/aaa_profiling/test_memusage.py
@@ -636,6 +636,8 @@ class MemUsageWBackendTest(EnsureZeroed):
 try:
 go()
 finally:
+import pdb
+pdb.set_trace()
 metadata.drop_all()
 clear_mappers()

This brings me right to a clean state where "next" will produce the
error.   Looking at Postgresql processes within the block, there are
no open transactions to the DB.  If you pdb right here, you can poke
around to see what state might be present.




> c. Something else.
>
> Thanks for your help!
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Updating pg8000 dialect following new release of pg8000

2017-08-31 Thread Tony Locke
There's a new release (1.11.0) of the pg8000 driver for PostgreSQL. It's a 
pure-python driver, and it already has a dialect for SQLAlchemy. This 
latest release is not backwardly compatible with the previous release, and 
I'm trying to modify the dialect accordingly. The main change is that 
connections and cursors are no longer threadsafe. In DB-API terms it has a 
threadsafety value of 1 (Threads may share the module, but not connections).

So the first problem I ran into was in the on_connect() method in the 
dialiect. It referred to the 'unicode' keyword, which caused a problem 
under Python 3. So I deleted the following:

 def on_connect(conn):
 conn.py_types[quoted_name] = conn.py_types[unicode]
 fns.append(on_connect)

just to get the tests going. The next problem is:

test/aaa_profiling/test_memusage.py::MemUsageWBackendTest_postgresql+pg8000_9_5_8::()::test_alias_pathing

which I think fails due to a threading problem. It seems that somehow the 
DB-API connection is shared between threads, which isn't supported any more 
with pg8000. So my question is, is it the case that:

a. DB-API connections must be threadsafe to work with SQLAlchemy.
b. There's a something wrong with the test.
c. Something else.

Thanks for your help!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Is Textual SQL DB neutral

2017-08-31 Thread Ken M
I tried a batch size as small as ten and each step took about 2 seconds. I set 
it up to show me the time between each batch for that run.  Lowering the batch 
size actually seems to worsen performance. Which is not what I expected.

Sent from my iPad

> On Aug 31, 2017, at 5:42 AM, Simon King  wrote:
> 
> I'm pretty sure the bad performance is due to pyodbc (or maybe SQL
> Server) not handling *huge* numbers of bind parameters gracefully. You
> are generating a query with (batch_size * n_primary_key_columns) bind
> parameters, which even in your original version meant 2000 bind
> parameters.
> 
> Try with a batch sizes of 10, 50 and 100. You'll run more queries but
> I suspect each query will perform better.
> 
> Simon
> 
>> On Wed, Aug 30, 2017 at 10:02 PM, Ken MacKenzie  wrote:
>> With a batch size of 1 based on the original selection, no improvement.
>> 
>> I am going to see what sql this is creating.
>> 
>> 2328928 function calls (2085124 primitive calls) in 1084.925
>> seconds
>> 
>>   Ordered by: internal time
>> 
>>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>   26 1074.733   41.336 1074.733   41.336 {method 'execute' of
>> 'pyodbc.Cursor' objects}
>>20.8710.4360.8720.436 {built-in method connect}
>> 349128/3485230.5560.0000.5730.000 {built-in method
>> isinstance}
>>   260.5020.0190.5020.019 {method 'close' of
>> 'pyodbc.Cursor' objects}
>>   200.4850.0240.4850.024 {method 'commit' of
>> 'pyodbc.Connection' objects}
>> 161062/1610600.3170.0000.3180.000 {built-in method getattr}
>>191230.2590.0000.7890.000 elements.py:861(__init__)
>> 95670.2580.0000.2580.000 {method 'fetchone' of
>> 'pyodbc.Cursor' objects}
>> 76492/191230.2480.0002.8420.000 operators.py:297(__eq__)
>> 67012/240.2360.0002.6010.108
>> visitors.py:75(_compiler_dispatch)
>>191230.2340.0000.4240.000 elements.py:2927(__init__)
>> 95820.2130.0001.1040.000 elements.py:1860(_construct)
>>191320.2050.0000.4350.000 compiler.py:676(visit_column)
>>191230.1890.0001.8060.000
>> compiler.py:1040(_generate_generic_binary)
>>191230.1830.0001.2630.000
>> default_comparator.py:290(_check_literal)
>> 57389/191430.1730.0002.7320.000 {built-in method eq}
>>191230.1710.0001.9120.000
>> default_comparator.py:22(_boolean_compare)
>>286840.1690.0000.3460.000
>> operators.py:1007(is_precedent)
>>191260.1430.0000.3520.000 elements.py:3962(__new__)
>>191230.1290.0000.7950.000
>> compiler.py:1164(visit_bindparam)
>>191230.1280.0000.9180.000 elements.py:3818(_bind_param)
>>191530.1260.0000.5810.000 base.py:1128(decorate)
>>10.1250.125 1084.925 1084.925 del_test.py:1()
>>   1168450.1210.0000.1210.000 {method 'get' of 'dict'
>> objects}
>> 38264/95810.1150.0002.5670.000 compiler.py:806()
>>191230.1130.0002.0480.000 compiler.py:1006(visit_binary)
>>191250.1130.0000.1740.000
>> compiler.py:1265(_process_anon)
>>191250.1090.0000.4510.000
>> compiler.py:1246(_truncated_identifier)
>>191230.1080.0002.4570.000 annotation.py:100(__eq__)
>>191230.1070.0002.1900.000 base.py:1306(visit_binary)
>>191230.1040.0002.0150.000 type_api.py:60(operate)
>>382830.1010.0000.1670.000 compiler.py:2901(quote)
>>191230.0950.0000.5770.000
>> compiler.py:1233(_truncate_bindparam)
>>192060.0940.0000.1630.000 elements.py:3918(__new__)
>>191250.0930.0000.3230.000 elements.py:4073(apply_map)
>>382460.0910.0000.0910.000 elements.py:640(self_group)
>>884600.0880.0000.0880.000 {method 'append' of 'list'
>> objects}
>>191230.0840.0002.6210.000 properties.py:269(operate)
>> 38264/95810.0840.0002.5860.000 compiler.py:804()
>>191230.0820.0000.7320.000
>> annotation.py:78(_compiler_dispatch)
>> 12703/31420.0760.0002.6050.001 {method 'join' of 'str'
>> objects}
>>191230.0750.0000.1290.000
>> compiler.py:959(_get_operator_dispatch)
>>10.0710.071 1082.250 1082.250 db_base.py:172(dict_delete)
>>191230.0700.0000.0890.000
>> compiler.py:1271(bindparam_string)
>>191230.0700.0002.0850.000 :1()
>> 95810.0670.0000.5200.000 elements.py:1883()
>> 95610.0640.0002.9050.000 db_base.py:208()
>> 23611/233230.0620.0000.0670.000 {built-in method 

Re: [sqlalchemy] Is Textual SQL DB neutral

2017-08-31 Thread Ken M
Yes the double quote was a typo, sorry about that.

Your point about non varchar pk, that is why I will double check other dbs. SQL 
server lets it fly but I think you are right about Postgres.



Sent from my iPad

> On Aug 31, 2017, at 5:36 AM, Simon King  wrote:
> 
> The generic version would look something like this:
> 
> table = cls.__table__
> pkconditions = []
> for pk in cls.SQL_PK:
>pkconditions.append(table.c[pk] == sa.bindparam(pk)
> condition = sa.and_(*pkconditions)
> statement = sa.delete(table, whereclause=condition)
> batchparams = [dict(zip(cls.SQL_PK, v)) for v in id_batch]
> session.execute(statement, batchparams)
> 
> As for the portability of the textual version: your original code
> produced fragments like this:
> 
>  field == 'value'
> 
> a) Is the double-equals a typo, or is it syntax that SQL Server
> expects? I've never used double-equals in MySQL.
> b) Are the primary key fields always strings? You later said that they
> were, but if you ever had an integer primary key column, comparing it
> against a quoted value would be an error on postgres (I believe).
> c) Can the values that you are comparing against ever contain single
> quotes? That would break your query and potentially leave you open to
> SQL injection.
> d) Are your values unicode strings or byte strings? If unicode, does
> the encoding match what the database expects?
> 
> If none of those are issues for you, the textual version is probably
> pretty safe.
> 
> Simon
> 
>> On Wed, Aug 30, 2017 at 5:30 PM, Ken MacKenzie  wrote:
>> Not sure how I would iterate through a non predetermined number of primary
>> keys.
>> 
>> I guess part of me is wondering that although textual sql is not inherently
>> db neutral how different between the db targets is the where field = 'value'
>> syntax?
>> 
>>> On Wednesday, August 30, 2017 at 12:07:52 PM UTC-4, Simon King wrote:
>>> 
>>> You could also try using executemany:
>>> 
>>> 
>>> http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements
>>> 
>>> I think it would look something like this:
>>> 
>>> table = cls.__table__
>>> condition = sa.and_(
>>>table.c.pk1 == sa.bindparam('pk1'),
>>>table.c.pk2 == sa.bindparam('pk2'),
>>> )
>>> statement = sa.delete(table, whereclause=condition)
>>> batchparams = [{'pk1': v[0], 'pk2': v[1]} for v in id_batch]
>>> session.execute(statement, batchparams)
>>> 
>>> Simon
>>> 
 On Wed, Aug 30, 2017 at 4:28 PM, Ken MacKenzie  wrote:
 After the current sorted profile finishes I will revert to the textual
 version and run a profile on that.  I expect another 10-15 minutes for
 this
 to finish right now.
 
 At present the batch size is set to 1000, total record count is just
 over
 9000 in these tests.
 
 The reason for 1000 was at first I was looking at doing this as a
 tuple_(fld, fld).in_((val, val),(val,val)) format.  The 1000 should keep
 me
 under most DB restrictions on the in statement.
 
 However since SQL Server does not seem to support the tuple_ usage I
 reverted to this method.
 
 I technically have one more method and that is a concat_ in_ where I
 concat
 the fields.
 
 Other specifics, the table in question has 2 fields for the PK, both are
 varchar, one length 3, the other length 10.  There are 5 non key fields,
 3
 short varchars, one decimal at 14,2 precision and one varchar(800) which
 contains description text.
 
 Total record count of the table before any deletion is about 1.05
 million.
 
 Python version is 3.4.5, running on a modest CentOS desktop and to be
 fair
 the SQL Server instance is sub optimal for development.
 
> On Wednesday, August 30, 2017 at 11:18:13 AM UTC-4, Simon King wrote:
> 
> It would be interesting to see the profile of the textual SQL version.
> It looks like most of the time is being spent inside pyodbc, rather
> than SQLAlchemy, so I guess it must be something to do with the
> processing of bind parameters. How many parameters are being sent in
> per query? ie. what is len(id_batch) * len(cls.SQL_PK)?
> 
> You could try playing with your batch sizes to see what sort of effect
> that has.
> 
> Simon
 
 --
 SQLAlchemy -
 The Python SQL Toolkit and Object Relational Mapper
 
 http://www.sqlalchemy.org/
 
 To post example code, please provide an MCVE: Minimal, Complete, and
 Verifiable Example. See http://stackoverflow.com/help/mcve for a full
 description.
 ---
 You received this message because you are subscribed to the Google
 Groups
 "sqlalchemy" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an
 email to sqlalchemy+...@googlegroups.com.
 To post to this group, send email to sqlal...@googlegroups.com.
 Visit 

Re: [sqlalchemy] Is Textual SQL DB neutral

2017-08-31 Thread Simon King
I'm pretty sure the bad performance is due to pyodbc (or maybe SQL
Server) not handling *huge* numbers of bind parameters gracefully. You
are generating a query with (batch_size * n_primary_key_columns) bind
parameters, which even in your original version meant 2000 bind
parameters.

Try with a batch sizes of 10, 50 and 100. You'll run more queries but
I suspect each query will perform better.

Simon

On Wed, Aug 30, 2017 at 10:02 PM, Ken MacKenzie  wrote:
> With a batch size of 1 based on the original selection, no improvement.
>
> I am going to see what sql this is creating.
>
>  2328928 function calls (2085124 primitive calls) in 1084.925
> seconds
>
>Ordered by: internal time
>
>ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>26 1074.733   41.336 1074.733   41.336 {method 'execute' of
> 'pyodbc.Cursor' objects}
> 20.8710.4360.8720.436 {built-in method connect}
> 349128/3485230.5560.0000.5730.000 {built-in method
> isinstance}
>260.5020.0190.5020.019 {method 'close' of
> 'pyodbc.Cursor' objects}
>200.4850.0240.4850.024 {method 'commit' of
> 'pyodbc.Connection' objects}
> 161062/1610600.3170.0000.3180.000 {built-in method getattr}
> 191230.2590.0000.7890.000 elements.py:861(__init__)
>  95670.2580.0000.2580.000 {method 'fetchone' of
> 'pyodbc.Cursor' objects}
> 76492/191230.2480.0002.8420.000 operators.py:297(__eq__)
>  67012/240.2360.0002.6010.108
> visitors.py:75(_compiler_dispatch)
> 191230.2340.0000.4240.000 elements.py:2927(__init__)
>  95820.2130.0001.1040.000 elements.py:1860(_construct)
> 191320.2050.0000.4350.000 compiler.py:676(visit_column)
> 191230.1890.0001.8060.000
> compiler.py:1040(_generate_generic_binary)
> 191230.1830.0001.2630.000
> default_comparator.py:290(_check_literal)
> 57389/191430.1730.0002.7320.000 {built-in method eq}
> 191230.1710.0001.9120.000
> default_comparator.py:22(_boolean_compare)
> 286840.1690.0000.3460.000
> operators.py:1007(is_precedent)
> 191260.1430.0000.3520.000 elements.py:3962(__new__)
> 191230.1290.0000.7950.000
> compiler.py:1164(visit_bindparam)
> 191230.1280.0000.9180.000 elements.py:3818(_bind_param)
> 191530.1260.0000.5810.000 base.py:1128(decorate)
> 10.1250.125 1084.925 1084.925 del_test.py:1()
>1168450.1210.0000.1210.000 {method 'get' of 'dict'
> objects}
> 38264/95810.1150.0002.5670.000 compiler.py:806()
> 191230.1130.0002.0480.000 compiler.py:1006(visit_binary)
> 191250.1130.0000.1740.000
> compiler.py:1265(_process_anon)
> 191250.1090.0000.4510.000
> compiler.py:1246(_truncated_identifier)
> 191230.1080.0002.4570.000 annotation.py:100(__eq__)
> 191230.1070.0002.1900.000 base.py:1306(visit_binary)
> 191230.1040.0002.0150.000 type_api.py:60(operate)
> 382830.1010.0000.1670.000 compiler.py:2901(quote)
> 191230.0950.0000.5770.000
> compiler.py:1233(_truncate_bindparam)
> 192060.0940.0000.1630.000 elements.py:3918(__new__)
> 191250.0930.0000.3230.000 elements.py:4073(apply_map)
> 382460.0910.0000.0910.000 elements.py:640(self_group)
> 884600.0880.0000.0880.000 {method 'append' of 'list'
> objects}
> 191230.0840.0002.6210.000 properties.py:269(operate)
> 38264/95810.0840.0002.5860.000 compiler.py:804()
> 191230.0820.0000.7320.000
> annotation.py:78(_compiler_dispatch)
> 12703/31420.0760.0002.6050.001 {method 'join' of 'str'
> objects}
> 191230.0750.0000.1290.000
> compiler.py:959(_get_operator_dispatch)
> 10.0710.071 1082.250 1082.250 db_base.py:172(dict_delete)
> 191230.0700.0000.0890.000
> compiler.py:1271(bindparam_string)
> 191230.0700.0002.0850.000 :1()
>  95810.0670.0000.5200.000 elements.py:1883()
>  95610.0640.0002.9050.000 db_base.py:208()
> 23611/233230.0620.0000.0670.000 {built-in method hasattr}
> 287050.0610.0000.1590.000
> elements.py:4216(_expression_literal_as_text)
> 287460.0580.0000.0980.000
> elements.py:4220(_literal_as_text)
> 191230.0580.0000.1300.000
> type_api.py:504(coerce_compared_value)
>  95620.0560.0000.4160.000 result.py:1156(fetchone)
> 191250.0560.0000.230

Re: [sqlalchemy] Is Textual SQL DB neutral

2017-08-31 Thread Simon King
The generic version would look something like this:

table = cls.__table__
pkconditions = []
for pk in cls.SQL_PK:
pkconditions.append(table.c[pk] == sa.bindparam(pk)
condition = sa.and_(*pkconditions)
statement = sa.delete(table, whereclause=condition)
batchparams = [dict(zip(cls.SQL_PK, v)) for v in id_batch]
session.execute(statement, batchparams)

As for the portability of the textual version: your original code
produced fragments like this:

  field == 'value'

a) Is the double-equals a typo, or is it syntax that SQL Server
expects? I've never used double-equals in MySQL.
b) Are the primary key fields always strings? You later said that they
were, but if you ever had an integer primary key column, comparing it
against a quoted value would be an error on postgres (I believe).
c) Can the values that you are comparing against ever contain single
quotes? That would break your query and potentially leave you open to
SQL injection.
d) Are your values unicode strings or byte strings? If unicode, does
the encoding match what the database expects?

If none of those are issues for you, the textual version is probably
pretty safe.

Simon

On Wed, Aug 30, 2017 at 5:30 PM, Ken MacKenzie  wrote:
> Not sure how I would iterate through a non predetermined number of primary
> keys.
>
> I guess part of me is wondering that although textual sql is not inherently
> db neutral how different between the db targets is the where field = 'value'
> syntax?
>
> On Wednesday, August 30, 2017 at 12:07:52 PM UTC-4, Simon King wrote:
>>
>> You could also try using executemany:
>>
>>
>> http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements
>>
>> I think it would look something like this:
>>
>> table = cls.__table__
>> condition = sa.and_(
>> table.c.pk1 == sa.bindparam('pk1'),
>> table.c.pk2 == sa.bindparam('pk2'),
>> )
>> statement = sa.delete(table, whereclause=condition)
>> batchparams = [{'pk1': v[0], 'pk2': v[1]} for v in id_batch]
>> session.execute(statement, batchparams)
>>
>> Simon
>>
>> On Wed, Aug 30, 2017 at 4:28 PM, Ken MacKenzie  wrote:
>> > After the current sorted profile finishes I will revert to the textual
>> > version and run a profile on that.  I expect another 10-15 minutes for
>> > this
>> > to finish right now.
>> >
>> > At present the batch size is set to 1000, total record count is just
>> > over
>> > 9000 in these tests.
>> >
>> > The reason for 1000 was at first I was looking at doing this as a
>> > tuple_(fld, fld).in_((val, val),(val,val)) format.  The 1000 should keep
>> > me
>> > under most DB restrictions on the in statement.
>> >
>> > However since SQL Server does not seem to support the tuple_ usage I
>> > reverted to this method.
>> >
>> > I technically have one more method and that is a concat_ in_ where I
>> > concat
>> > the fields.
>> >
>> > Other specifics, the table in question has 2 fields for the PK, both are
>> > varchar, one length 3, the other length 10.  There are 5 non key fields,
>> > 3
>> > short varchars, one decimal at 14,2 precision and one varchar(800) which
>> > contains description text.
>> >
>> > Total record count of the table before any deletion is about 1.05
>> > million.
>> >
>> > Python version is 3.4.5, running on a modest CentOS desktop and to be
>> > fair
>> > the SQL Server instance is sub optimal for development.
>> >
>> > On Wednesday, August 30, 2017 at 11:18:13 AM UTC-4, Simon King wrote:
>> >>
>> >> It would be interesting to see the profile of the textual SQL version.
>> >> It looks like most of the time is being spent inside pyodbc, rather
>> >> than SQLAlchemy, so I guess it must be something to do with the
>> >> processing of bind parameters. How many parameters are being sent in
>> >> per query? ie. what is len(id_batch) * len(cls.SQL_PK)?
>> >>
>> >> You could try playing with your batch sizes to see what sort of effect
>> >> that has.
>> >>
>> >> Simon
>> >
>> > --
>> > SQLAlchemy -
>> > The Python SQL Toolkit and Object Relational Mapper
>> >
>> > http://www.sqlalchemy.org/
>> >
>> > To post example code, please provide an MCVE: Minimal, Complete, and
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> > description.
>> > ---
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "sqlalchemy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to sqlalchemy+...@googlegroups.com.
>> > To post to this group, send email to sqlal...@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received 

Re: [sqlalchemy] One-to-one relationship with delete-orphan and update breaks UniqueConstraint

2017-08-31 Thread Levon Saldamli
Thank you for your answer. I think detaching and doing a session.flush() is 
a good enough solution for now.

Den onsdag 30 augusti 2017 kl. 16:31:03 UTC+2 skrev Mike Bayer:
>
> On Wed, Aug 30, 2017 at 7:52 AM, Levon Saldamli  > wrote: 
> > I'm trying to have to objects with a one-to-one relationship, e.g. User 
> and 
> > Address, with a UniqueConstraint for user.id in the address table 
> (names 
> > changed from the real code). But when user.address is updated, the 
> commit 
> > fails because the UniqueConstraint is broken, probably because 
> SqlAlchemy 
> > inserts the new address into the table before removing the old one. If I 
> set 
> > the address to None, commit and then set to a new address, it works 
> fine. Is 
> > this a known issue or limitation, and is there a possibility that this 
> will 
> > be supported in the future? 
>
> You've hit a classic issue, which is known as the "DELETE before 
> INSERT" problem: 
>
>
> https://bitbucket.org/zzzeek/sqlalchemy/issues/2501/the-delete-before-insert-problem
>  
> 
>  
>
> There is in fact a solution attached to that issue in the form of a 
> new option that would be added to the mapper() where you provide a set 
> of columns that need to be maintained as unique. 
>
> The reason it hasn't found its way in yet is because it's not clear 
> how well received this option would be, what the real demand is for it 
> as it's a fair it of work, or how well it works in real world use 
> cases - you can see in the comments that everyone who tries it out 
> still has some other aspects of their problem it does not solve, and 
> then for whatever reason seem to move on from it.   I've yet to have 
> someone say hey this solved my problem. 
>
> This is partially because the workaround is pretty simple, you just 
> detach the first object, do session.flush() to delete it, then attach 
> the new one. 
>
> Let me know which approach you'd like to proceed with. 
>
>
> > 
> > Full example with the failing commit below. SqlAlchemy version used is 
> > 1.1.13. The problem is reproducible with both sqlite and postgresql. 
> > 
> > from sqlalchemy import create_engine 
> > from sqlalchemy.ext.declarative import declarative_base 
> > from sqlalchemy import Column, Integer, String 
> > from sqlalchemy import ForeignKey 
> > from sqlalchemy.orm import relationship 
> > from sqlalchemy import UniqueConstraint 
> > 
> > engine = create_engine('sqlite:///:memory:', echo=True) 
> > 
> > 
> > Base = declarative_base() 
> > 
> > class User(Base): 
> > __tablename__ = 'users' 
> > id = Column(Integer, primary_key=True) 
> > name = Column(String) 
> > fullname = Column(String) 
> > 
> > address = relationship("Address", uselist=False, 
> back_populates="user", 
> > cascade="all, delete-orphan") 
> > 
> > def __repr__(self): 
> >return "" % ( 
> > self.name, self.fullname, self.id) 
> > 
> > 
> > class Address(Base): 
> > __tablename__ = 'addresses' 
> > id = Column(Integer, primary_key=True) 
> > email_address = Column(String, nullable=False) 
> > user_id = Column(Integer, ForeignKey('users.id')) 
> > 
> > user = relationship("User", back_populates="address") 
> > __table_args__ = (UniqueConstraint('user_id', 
> > name='_one_unique_address_per_user_uc'),) 
> > 
> > def __repr__(self): 
> > return "" % self.email_address 
> > 
> > 
> > Base.metadata.create_all(engine) 
> > 
> > ed_user = User(name='ed', fullname='Ed Jones') 
> > 
> > from sqlalchemy.orm import sessionmaker 
> > Session = sessionmaker(bind=engine) 
> > 
> > 
> > ed_user.address = Address(email_address='f..._at_bar.com ') 
>
> > 
> > session = Session() 
> > session.add(ed_user) 
> > session.commit() 
> > 
> > ed_user.address = Address(email_address='new_address_at_bar.com 
> ') 
> > session.add(ed_user) 
> > session.commit() 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to sqlalchemy+...@googlegroups.com . 
> > To post to this group, send email to sqlal...@googlegroups.com 
> . 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example