[sqlalchemy] Re: Assert ResultProxy

2019-12-02 Thread sumau
Thanks! And to show the differences more clearly:

for tag, i1, i2, j1, j2 in sm.get_opcodes():
if tag == 'replace':
for i in range(i1,i2):
print (seq1[i])
print (seq2[i])


Soumaya

On Monday, 2 December 2019 16:54:16 UTC, Steven James wrote:
>
> In case you want more details about the differences, you could also use 
> difflib...
>
> from difflib import SequenceMatcher
>
> seq1 = [tuple(row.values()) for row in resultproxy1]
> seq2 = [tuple(row.values()) for row in resultproxy2]
>
> sm = SequenceMatcher(a=seq1, b=seq2, autojunk=False)
> print(sm.get_opcodes())
> print(f'similarity: {sm.ratio()}')
>
> assert sm.ratio() == 1  # example to ensure results are equivalent
> assert sm.ratio() == 1, sm.get_opcodes()  # pytest syntax to show the 
> opcodes if the assertion fails
>
> Steven James
>
> On Friday, 29 November 2019 09:13:23 UTC-5, sumau wrote:
>>
>> Hello
>>
>> I think my original question was too generic so rephrasing... Is there a 
>> way in sqlalchemy to:
>>
>>1. Assert a ResultProxy against an expected ResultProxy (or list of 
>>RowProxies against expected list of RowProxies) 
>>2. Show any differences
>>
>> I wanted to check first before writing my own script :-)
>>
>> Regards
>> S
>>
>> On Friday, 22 November 2019 10:50:54 UTC, sumau wrote:
>>>
>>> Hello
>>>
>>> I would like to assert the contents of tables in my PG schema i.e. make 
>>> sure it contains the data I'm expecting
>>>
>>> I am aware of various options:
>>>
>>> 1) Compare the actual and expected tables using a sql query, 
>>> orchestrated by sqlalchemy (i.e. create the actual and expected tables in 
>>> DB, run the sql comparison script, return the output)
>>> 2) Load the actual tables as tuples and compare them with expected 
>>> tuples using something like assert_result
>>>
>>> https://github.com/sqlalchemy/sqlalchemy/blob/d933ddd503a1ca0a7c562c51c503139c541e707e/lib/sqlalchemy/testing/assertions.py#L465
>>> 3) Load the actual tables as dataframes and compare them with expected 
>>> dataframes using pandas assert_frame_equal
>>>
>>> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.testing.assert_frame_equal.html
>>>
>>> Any recommendations / thoughts would be much appreciated, both as to the 
>>> approach and the implementation :-)
>>>
>>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/23162233-bbaa-4f57-9f12-60428f487ce1%40googlegroups.com.


Re: [sqlalchemy] Why would rollback() not be sufficient

2019-12-02 Thread Mike Bayer


On Mon, Dec 2, 2019, at 2:02 PM, Lee Doolan wrote:
> I recently encountered this little snippet of code at the beginning of a 
> transaction
> 
> # I am surprised we need to call both of these, but
> # we do. Neither alone will prevent stale data from
> # persisting in the session.
>  meta.Session.expunge_all()
>  meta.Session.rollback()
> 
> 
> I have looked through the SQLA documentation (1.3) and I cannot find any 
> reason 
> why simply using a rollback() would not suffice here. Maybe it was necessary 
> when the code
> was written for an older version of SQLA.
> 
> Does anyone have any thoughts on the matter?
> 

Hi there -

expunge_all() and rollback() do different things, and both have a purpose. As 
far as what will "suffice" it depends on what exactly it is that you are 
attempting to accomplish.

I would also note that calling session.close() has the same effect as that of 
the two statements above. 

A typical reason one might want to use both is so that any Python objects which 
are attached to this Session are no longer attached, and as such when their 
unloaded attributes are accessed, no SQL statement or implicit transaction will 
be started again on the original Session.

Documentation for this process at 
https://docs.sqlalchemy.org/en/13/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it






> 
> 

> --
>  SQLAlchemy - 
>  The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
>  To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
>  --- 
>  You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
>  To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
>  To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/55af2158-c110-40e8-b79e-ff320dafeb6a%40googlegroups.com
>  
> .

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/b9ae86a6-6336-4281-b8fc-4a9e7d70cc0f%40www.fastmail.com.


[sqlalchemy] Why would rollback() not be sufficient

2019-12-02 Thread Lee Doolan
I recently encountered this little snippet of code at the beginning of a 
transaction

# I am surprised we need to call both of these, but
# we do.  Neither alone will prevent stale data from
# persisting in the session.
meta.Session.expunge_all()
meta.Session.rollback()


I have looked through the SQLA documentation (1.3) and I cannot find any 
reason 
why simply using a rollback() would not suffice here.  Maybe it was 
necessary when the code
was written for an older version of SQLA.

Does anyone have any thoughts on the matter?

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/55af2158-c110-40e8-b79e-ff320dafeb6a%40googlegroups.com.


[sqlalchemy] Re: Assert ResultProxy

2019-12-02 Thread Steven James
In case you want more details about the differences, you could also use 
difflib...

from difflib import SequenceMatcher

seq1 = [tuple(row.values()) for row in resultproxy1]
seq2 = [tuple(row.values()) for row in resultproxy2]

sm = SequenceMatcher(a=seq1, b=seq2, autojunk=False)
print(sm.get_opcodes())
print(f'similarity: {sm.ratio()}')

assert sm.ratio() == 1  # example to ensure results are equivalent
assert sm.ratio() == 1, sm.get_opcodes()  # pytest syntax to show the 
opcodes if the assertion fails

Steven James

On Friday, 29 November 2019 09:13:23 UTC-5, sumau wrote:
>
> Hello
>
> I think my original question was too generic so rephrasing... Is there a 
> way in sqlalchemy to:
>
>1. Assert a ResultProxy against an expected ResultProxy (or list of 
>RowProxies against expected list of RowProxies) 
>2. Show any differences
>
> I wanted to check first before writing my own script :-)
>
> Regards
> S
>
> On Friday, 22 November 2019 10:50:54 UTC, sumau wrote:
>>
>> Hello
>>
>> I would like to assert the contents of tables in my PG schema i.e. make 
>> sure it contains the data I'm expecting
>>
>> I am aware of various options:
>>
>> 1) Compare the actual and expected tables using a sql query, orchestrated 
>> by sqlalchemy (i.e. create the actual and expected tables in DB, run the 
>> sql comparison script, return the output)
>> 2) Load the actual tables as tuples and compare them with expected tuples 
>> using something like assert_result
>>
>> https://github.com/sqlalchemy/sqlalchemy/blob/d933ddd503a1ca0a7c562c51c503139c541e707e/lib/sqlalchemy/testing/assertions.py#L465
>> 3) Load the actual tables as dataframes and compare them with expected 
>> dataframes using pandas assert_frame_equal
>>
>> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.testing.assert_frame_equal.html
>>
>> Any recommendations / thoughts would be much appreciated, both as to the 
>> approach and the implementation :-)
>>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/ca72e6a2-1c81-4775-af26-af5d465d037b%40googlegroups.com.


Re: [sqlalchemy] 'selectin' loading with composite keys on DB2 for i

2019-12-02 Thread Steven James
Wanted to note: this fix seems to be required to use composite keys with 
sqlite / selectin as well.

On Thursday, 27 June 2019 15:53:44 UTC-4, Steven James wrote:
>
> This Worked!
>
> @compiles(BinaryExpression, 'ibm_db_sa')
> def _comp_binary(element, compiler, **kwargs):
> text = compiler.visit_binary(element, **kwargs)
> if element.operator is operators.in_op:
> text = re.sub(r"\) IN \(", ") IN (VALUES ", text)
> return text
>
>
> Had to make a couple of changes 1) the dialect name is 'ibm_db_sa' and 
> 2) (strangely!) DB2 gave me an error that it explicitly disallows parameter 
> markers in VALUES (but only in the single-column-key case). My other tests 
> seem to indicate that this was a misleading error message but I'm not going 
> to touch it again now that it is working. 
>
> Thanks!
>
> On Thursday, 27 June 2019 15:03:01 UTC-4, Mike Bayer wrote:
>>
>>
>>
>> On Thu, Jun 27, 2019, at 2:11 PM, Steven James wrote:
>>
>> Currently, `selectin` loading with composite keys works for me on MySQL 
>> and SQLite. The documentation states that it also works with Postgres. I'm 
>> currently trying to get it working on DB2 (connecting to a DB2 for i, 7.2 
>> system.
>>
>> (the following assumes a table with the primary key consisting of two 
>> columns: a and b)
>>
>> selectin loading currently emits:
>>  `SELECT * FROM table WHERE (table.a, table.b) IN ((?, ?), (?, ?))`
>>
>> For this type of loading to work in DB2 (DB2 for i), the syntax needs to 
>> be:
>> `SELECT * FROM table WHERE (table.a, table.b) IN VALUES (?, ?), (?,?)`
>>
>> Is there any way to implement this without a core change? I'm wondering 
>> if I can override the normal operation of in_() using a custom dialect or 
>> custom default comparator.
>>
>>
>> funny thing is that I'm a Red Hat employee, so assuming RH's merger with 
>> IBM goes through I may eventually be an IBM employee, and maybe they'd like 
>> to give me DB2 things to work on :)   However, that is not the case right 
>> now and I've done only very limited work with the DB2 driver as I'm sure 
>> you're aware the database itself is a beast.
>>
>> So this is something DB2's SQLAlchemy driver will have to add support for 
>> at some point, the selectinload thing is going to become more popular and 
>> also the internal mechanism for "IN" is going to be moving entirely to a 
>> newer architecture called "expanding".   That's probably not important here 
>> though.
>>
>> For now, in order to get that "VALUES" in there, you don't need to 
>> "change" Core or work with custom datatypes, there's a variety of event 
>> hooks that can give you access to that part of the SQL more at the string 
>> level.  I'm able to make this work also on Postgresql by intercepting 
>> BinaryExpression in the compiler, see the example below.
>>
>> import re
>>
>> from sqlalchemy import Column
>> from sqlalchemy import create_engine
>> from sqlalchemy import ForeignKeyConstraint
>> from sqlalchemy import Integer
>> from sqlalchemy import String
>> from sqlalchemy.ext.compiler import compiles
>> from sqlalchemy.ext.declarative import declarative_base
>> from sqlalchemy.orm import relationship
>> from sqlalchemy.orm import selectinload
>> from sqlalchemy.orm import Session
>> from sqlalchemy.sql import operators
>> from sqlalchemy.sql.expression import BinaryExpression
>>
>>
>> @compiles(BinaryExpression, "postgresql")  # because I'm testing it here
>> @compiles(BinaryExpression, "db2")
>> def _comp_binary(element, compiler, **kw):
>> text = compiler.visit_binary(element, **kw)
>> if element.operator is operators.in_op:
>> text = re.sub(r" IN \(", " IN (VALUES ", text)
>> return text
>>
>>
>> Base = declarative_base()
>>
>>
>> class A(Base):
>> __tablename__ = "a"
>>
>> id = Column(Integer, primary_key=True)
>> id2 = Column(Integer, primary_key=True)
>> data = Column(String)
>>
>> bs = relationship("B")
>>
>>
>> class B(Base):
>> __tablename__ = "b"
>> id = Column(Integer, primary_key=True)
>>
>> a_id = Column(Integer)
>> a_id2 = Column(Integer)
>>
>> __table_args__ = (
>> ForeignKeyConstraint(["a_id", "a_id2"], ["a.id", "a.id2"]),
>> )
>>
>>
>> e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
>> Base.metadata.drop_all(e)
>> Base.metadata.create_all(e)
>>
>> s = Session(e)
>>
>> s.add(A(id=1, id2=1, bs=[B(), B()]))
>> s.commit()
>>
>> s.query(A).options(selectinload(A.bs)).all()
>>
>>
>>
>>
>>
>>
>> Thanks,
>> Steven James
>>
>>
>> --
>> SQLAlchemy - 
>> The Python SQL Toolkit and Object Relational Mapper
>>  
>> http://www.sqlalchemy.org/
>>  
>> To post example code, please provide an MCVE: Minimal, Complete, and 
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>> description.
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> 

[sqlalchemy] How to speed up Pandas read_sql (with SQL Alchemy as underlying engine) from Oracle DB?

2019-12-02 Thread Ryan Wolniak


I'd like to optimize querying and converting a list of Oracle tables into 
pandas dataframes.


The eventual goal is to convert to Parquet, write to disk, then upload to 
S3, but for now I just want to focus on the pandas / sqlalchemy / 
parallelism part. My code sort of works, but it's very slow and seems to 
hang after completing 10 tables.


Any advice for speeding things up or alternative suggestions?


import sqlalchemyfrom sqlalchemy.orm import sessionmaker, scoped_sessionfrom 
multiprocessing.dummy import Pool as ThreadPool from multiprocessing import 
Poolimport pyarrow as paimport pyarrow.parquet as pq
def process_chunk(chunk, table_name, index):
table = pa.Table.from_pandas(chunk)
local_file_name = "./" + table_name + "-" + str(index) + ".parquet"
pq.write_table(table, local_file_name)
def process_table(table):
db_session = DBSession()
# helper function that creates the SQL query (select col1, col2, col3, ..., 
colX from table)
query = setup_query(table)
i=0
# is this the right way to use the db_session?
for chunk in pd.read_sql(query, db_session.bind, chunksize=30):
process_chunk(chunk, table, i)
i+=1

oracle_connect_str = #string_here#
oracle_engine = sqlalchemy.create_engine(
oracle_connect_str,
arraysize=1)
# set up session object to be used by threadsDBSession = scoped_session(
sessionmaker(
autoflush=True,
autocommit=False,
bind=oracle_engine
))

pool = ThreadPool(4)

table_list = ['tbl1','tbl2','tbl3','tbl4','tbl5',...,'tbl20']
# use pool.map instead of creating boiler-plate threading class
pool.map(process_table, table_list)
# are these in the right spots?
pool.close()
pool.join()



Thanks!


-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/0e7bf9a8-0335-4a07-af90-6b217daeb2ea%40googlegroups.com.


Re: [sqlalchemy] ORM does not select column from nested union

2019-12-02 Thread Mikhail Knyazev
Mike, thanks a lot for your help! Sorry for over-distilled example, I'm 
adding some context to (maybe) make the issue more searchable. Your guess 
is on point: I'm trying to select two querysets from the same table. One of 
them should be sorted by `modified_at` column while the other should use 
`created_at` column. Then the resulting query is augmented with row_number 
like `SELECT ..., row_number() OVER (ORDER BY order_by) FROM ` and 
used in `select_entity_from`.

Using constructed columns like you suggested did the trick.


On Saturday, November 30, 2019 at 2:28:07 AM UTC+3, Mike Bayer wrote:
>
> quick and dirty I would just manufacture that column so that it has no 
> Python-side correspondence:
>
> from sqlalchemy import column
>
> select_entity_from(
>select([Article, column('modified_at').label('order_by')]),
>select([Article, column('created_at').label('order_by')]),
> ).order_by("order_by")  # im assuming this is what you are actually doing
>
> also this *might* be different if you tried github master / 1.4 where 
> there have been changes to how dupe columns are handled
>
>
> On Fri, Nov 29, 2019, at 6:20 PM, Mike Bayer wrote:
>
>
>
> On Fri, Nov 29, 2019, at 1:53 PM, Mikhail Knyazev wrote:
>
> Hi, Mike.
> I run into strange behavior, see example below. In short, ORM does not 
> select column from nested union when a label is assigned to the column. 
> This results in mixed up attributes of a mapped object.
>
>
> hi 
>
> the "modified_at" / "created_at" columns are being repeated in each SELECT 
> in a mixed way and I would guess this is confusing the ORM, which is likely 
> using that last column to populate "modified_at" in the entity.  the 
> query as given doesn't seem to be using this "order_by" column and it's 
> also selecting dupes so I believe we have to resolve for a modified  XY 
> problem here [1]  with the modification that "user doesn't know how to do 
> Y" should read "SQLAlchemy can't really do Y without some trickery if at 
> all"     What is the *actual* thing you need to do ?
>
> [1] http://xyproblem.info/
>
>
> from datetime import datetime
>
> from sqlalchemy import Column, DateTime, Integer, create_engine, select, 
> union
> from sqlalchemy.ext.declarative import declarative_base
> from sqlalchemy.orm import sessionmaker
>
> Base = declarative_base()
>
>
> class Article(Base):
> __tablename__ = 'article'
> id = Column(Integer, primary_key=True)
> created_at = Column(DateTime)
> modified_at = Column(DateTime)
>
>
> engine = create_engine('sqlite:///:memory:', echo=True)
> Base.metadata.create_all(engine)
> Session = sessionmaker(bind=engine)
> session = Session()
>
> dt1, dt2 = datetime(2011, 1, 1), datetime(2012, 2, 2)
> dt3, dt4 = datetime(2013, 3, 3), datetime(2014, 4, 4)
>
> article1 = Article(created_at=dt1, modified_at=dt2)
> article2 = Article(created_at=dt3, modified_at=dt4)
> session.add_all((article1, article2))
> session.commit()
> session.expunge_all()
>
> query = (
> session.query(Article)
> .select_entity_from(
> union(
> select((Article, Article.modified_at.label('order_by'))),
> select((Article, Article.created_at.label('order_by'))),
> )
> )
> .order_by(Article.id)
> )
>
> article1, article2 = query.all()
>
> print('article1')
> print('\tcreated_at', article1.created_at, '\t\texpected', dt1.isoformat
> ())
> print('\tmodified_at', article1.modified_at, '\texpected', dt2.isoformat
> ())
> print('article2')
> print('\tcreated_at', article2.created_at, '\t\texpected', dt3.isoformat
> ())
> print('\tmodified_at', article2.modified_at, '\texpected', dt4.isoformat
> ())
>
> # article1
> # created_at 2011-01-01 00:00:00  expected 
> 2011-01-01T00:00:00
> # modified_at 2011-01-01 00:00:00 expected 
> 2012-02-02T00:00:00
> # article2
> # created_at 2013-03-03 00:00:00  expected 
> 2013-03-03T00:00:00
> # modified_at 2013-03-03 00:00:00 expected 
> 2014-04-04T00:00:00
>
> Rendered SQL query:
>
> SELECT anon_1.id AS anon_1_id, anon_1.order_by AS anon_1_order_by, 
> anon_1.modified_at 
> AS anon_1_modified_at
> FROM (SELECT article.id  AS id,
>  article.created_at  AS created_at, -- this field is not 
> selected by outer SELECT
>  article.modified_at AS modified_at,
>  article.created_at  AS order_by
>   FROM article
>   UNION
>   SELECT article.id  AS id,
>  article.created_at  AS created_at, -- this field is not 
> selected by outer SELECT
>  article.modified_at AS modified_at,
>  article.modified_at AS order_by
>   FROM article) AS anon_1
> ORDER BY anon_1.id
>
>
>
>
>
> --
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for