Re: [sqlalchemy] Attaching a second database to a connection

2020-07-06 Thread Mike Bayer


On Mon, Jul 6, 2020, at 11:19 PM, Richard Damon wrote:
> SQLite allows a program to attach multiple databases to a single
> connection, and you are able to reference tables in these additional
> databases with things like schema.table as the name of a table.
> 
> Is there a way to do this in SQLAlchemy?

sure, you use SQLite's ATTACH DATABASE command, usually using an event so it 
occurs for all connections automatically, here is code from our test suite:

 from sqlalchemy import event

 engine = create_engine("sqlite://")

 @event.listens_for(engine, "connect")
 def connect(dbapi_connection, connection_record):
 dbapi_connection.execute(
 'ATTACH DATABASE "test_schema.db" AS test_schema'
 )

then you reference the attached database as a schema, Table(..., 
schema="test_schema")





> 
> I am working on an application that will want to import data from
> another database (that uses basically the same schema, maybe just a
> subset of the schema of the main database), and get the updates needed
> to perform by using a join on unique keys (that aren't necessarily the
> primary key).
> 
> After finishing the update, and pulling the information in (remapping
> rowid/primary keys <-> foreign keys that didn't match between the
> databases) I would then detach this database (which ideally I opened as
> a read only connection).
> 
> I can see how to establish multiple engines and sessions, but then I
> can't do the join between the databases which would let me do a lot of
> the work down in the database engine. I also have found being able to
> bind different sets of tables into different engines, but in my case the
> database will have the same set of tables, so this doesn't look to work.
> 
> -- 
> Richard Damon
> 
> -- 
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/93932507-f4b8-d378-8db6-28636d7a0825%40Damon-Family.org.
> 

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/c45d838b-985d-4c93-9c6f-42aace4a76ca%40www.fastmail.com.


[sqlalchemy] Attaching a second database to a connection

2020-07-06 Thread Richard Damon
SQLite allows a program to attach multiple databases to a single
connection, and you are able to reference tables in these additional
databases with things like schema.table as the name of a table.

Is there a way to do this in SQLAlchemy?

I am working on an application that will want to import data from
another database (that uses basically the same schema, maybe just a
subset of the schema of the main database), and get the updates needed
to perform by using a join on unique keys (that aren't necessarily the
primary key).

After finishing the update, and pulling the information in (remapping
rowid/primary keys <-> foreign keys that didn't match between the
databases) I would then detach this database (which ideally I opened as
a read only connection).

I can see how to establish multiple engines and sessions, but then I
can't do the join between the databases which would let me do a lot of
the work down in the database engine. I also have found being able to
bind different sets of tables into different engines, but in my case the
database will have the same set of tables, so this doesn't look to work.

-- 
Richard Damon

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/93932507-f4b8-d378-8db6-28636d7a0825%40Damon-Family.org.


Re: [sqlalchemy] SQLAlchemy taking too much time to process the result

2020-07-06 Thread Mike Bayer


On Mon, Jul 6, 2020, at 2:14 PM, Saylee M. wrote:
> Hello all, 
> Hope you are fine and safe in these times!
> 
> I can be easily considered as a novice in SQLAlchemy.
> I am trying to pull data from a MYSQL database from a table having around 20 
> columns and more than 10 million rows. 
> The table is partitioned and indexed. I am using a complex query, having 
> UNION of multiple sub-queries, in the format:
> 
> with cte1 as , 
> cte2 as , 
> .. 
> select * from cte1
> union all
> select * from cte2
> ;
> 
> The code demands it to be a dynamic query and number of sub-queries depends 
> on the user input. 
> For understanding purpose, I took a query having two sub-queries. 
> So, when I passed the query to MySQL directly, it took very less time (around 
> 0.016 seconds) but when I passed the same 
> query through SQLAlchemy connector, it took around 600 seconds.

hi -

when you say "mysql directly" what are you refering to, your command line 
client, or the Python driver? Also what python driver are you using? 

A comparison for how much time the operation needs to take should be produced 
by using the Python driver that you have with the *exact* SQL query you're 
starting with, then fully fetching all rows in Python. It sounds a little off 
that the mysqlclient driver can fetch ten million rows in 0.016 seconds. at the 
end of this message is a test script that fetches 5 million rows with just two 
columns each, using the raw mysqlclient driver. mysqlclient is a native driver 
that is as fast as you can get. I'm running it on my laptop with *no network*, 
local MySQL server, and it takes 7 seconds to fetch that many rows. ten million 
rows is a lot and it's not going to be very fast in any case. 

That said, the ORM when it fetches full Python objects is *much* slower than 
fetching rows, because there is a lot of Python overhead in building up the 
objects. if you have 10M rows, just for Python to set aside the memory to store 
10M heavy Python objects at once will take many seconds, and once it churns 
into swap space the time will grow exponentially. It's unlikely you need the 
full blown business object functionality on 10M rows at once so I would fetch 
columns instead of objects. There's discussion about this in the FAQ at 
https://docs.sqlalchemy.org/en/13/faq/performance.html#result-fetching-slowness-orm
 . Additionally, the ORM can yield out the ORM objects in batches, keeping in 
mind the database driver has probably buffered the raw rows in any case, using 
yield_per().

What I would suggest is take a look at all the example fetching suites at 
https://docs.sqlalchemy.org/en/13/_modules/examples/performance/large_resultsets.html
 . This illustrates all the different ways you can fetch rows with SQLAlchemy 
and compares the speed of each. you can run this suite straight from the 
distribution, and it will show the relative differences in speed between 
different kinds of fetches. The ORM in particular has a lot of work to do in 
both generating Python objects and populating them, and if you fetch rows with 
columns instead of objects, that will save most of the time.

To get a feel for this suite, here's a run from my own laptop:

$ python -m examples.performance large_resultsets --dburl 
mysql://scott:tiger@localhost/test
Running setup once...
Tests to run: test_orm_full_objects_list, test_orm_full_objects_chunks, 
test_orm_bundles, test_orm_columns, test_core_fetchall, 
test_core_fetchmany_w_streaming, test_core_fetchmany, 
test_dbapi_fetchall_plus_append_objects, test_dbapi_fetchall_no_object
test_orm_full_objects_list : Load fully tracked ORM objects into one big 
list(). (50 iterations); total time 6.135940 sec
test_orm_full_objects_chunks : Load fully tracked ORM objects a chunk at a time 
using yield_per(). (50 iterations); total time 3.340366 sec
test_orm_bundles : Load lightweight "bundle" objects using the ORM. (50 
iterations); total time 0.949388 sec
test_orm_columns : Load individual columns into named tuples using the ORM. 
(50 iterations); total time 0.560157 sec
test_core_fetchall : Load Core result rows using fetchall. (50 iterations); 
total time 0.466407 sec
test_core_fetchmany_w_streaming : Load Core result rows using 
fetchmany/streaming. (50 iterations); total time 0.339930 sec
test_core_fetchmany : Load Core result rows using Core / fetchmany. (50 
iterations); total time 0.470984 sec
test_dbapi_fetchall_plus_append_objects : Load rows using DBAPI fetchall(), 
generate an object for each row. (50 iterations); total time 0.476398 sec
test_dbapi_fetchall_no_object : Load rows using DBAPI fetchall(), don't make 
any objects. (50 iterations); total time 0.330984 sec





import random
import time

from sqlalchemy import create_engine

e = create_engine("mysql://scott:tiger@localhost/test")

with e.connect() as conn:
 conn.execute("drop table if exists data")
 conn.execute("create table data (x integer, y integer)")
 conn.execute(
 

[sqlalchemy] Re: SQLAlchemy taking too much time to process the result

2020-07-06 Thread 'Jonathan Vanasco' via sqlalchemy


On Monday, July 6, 2020 at 2:14:33 PM UTC-4, Saylee M. wrote:
 

> So, when I passed the query to MySQL directly, it took very less time 
> (around 0.016 seconds) but when I passed the same 
> query through SQLAlchemy connector, it took around 600 seconds
>

"query ... MySQL directly"

Do you mean using the MySQL commandline client?  Assuming yes, the 0.016 
time only reflects the time MySQL spent processing the query and generating 
the result set; the SQLAlchemy time includes that + transferring all the 
data + generating Python data structures (which could be SQLAlchemy ORM 
models or generic python data structures)

There are also external factors that can account for time changes - like 
server load, index loading, cache utilization 

I am not sure what can be issue. It'll be great if I can get any pointers 
> to reduce the time, preferably under 10 seconds!
>

Showing a short, self contained, correct example (sscce) of your code would 
let others troubleshoot it more effectively.  The most likely situation 
though, is that you are loading all the rows.  There should be no 
difference in the query time.
 

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/1b1d83d9-eb01-4541-962a-3d4a6551afb9o%40googlegroups.com.


Re: [sqlalchemy] convert subset to dictionary

2020-07-06 Thread Justvuur
I added the eager-loading but it seems to slow down the SQL query quite a 
lot.
It's as if now, the SQL query is taking longer but the generating of the 
file is quicker hehe... I guess now the queries are being fired before to 
populate subjects.
It's still taking relatively the same amount of time though.


On Friday, 3 July 2020 17:07:43 UTC+2, Simon King wrote:
>
> Are you eager-loading the "student.subjects" relationship? If not, 
> that will give you the biggest performance increase. Without that, you 
> will be issuing a separate DB query for each of the students, to load 
> that student's subjects. Eager-loading allows you to preload the 
> subjects for every student in a single query: 
>
>
> https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html#joined-eager-loading
>  
>
> Simon 
>
> On Fri, Jul 3, 2020 at 1:36 PM Justvuur > 
> wrote: 
> > 
> > Hi Simon, thanks for the help! I've never used that before, it's quite 
> handy. 
> > 
> > I'm looping through all the students and printing them and their subject 
> details to a CSV file. 
> > What makes things a tad complicated is the subjects must appear in a 
> specific order. 
> > There is a table that has the subject code and order number 
> (ordered_subjects used below is the resultset from it). 
> > I printed out the timing and found the problem to be with a nested for 
> loop. 
> > 
> > I was hoping to reduce that process time by using a map that 
> automatically gets populated instead of having to create it on the fly. 
> > 
> > Before - subjects_collection "attribute_mapped_collection": 
> > 
> 
>  
>
> > for row in students: 
> > row_no += 1 
> > 
> > for subject in row.subjects: 
> > student_subjects[subject.code] = subject.value 
> > 
> > csv_row = [row_no] 
> > csv_row += [student_subjects.get(x.code, '') for x in 
> ordered_subjects] 
> > csv_row += [row.created_on, row.updated_on] 
> > 
> > writer.writerow([x.encode('utf-8') if type(x) == unicode else x for 
> x in csv_row]) 
> > 
> > 
> > After adding the subjects_collection "attribute_mapped_collection", I 
> unfortunately did not see a change in performance. 
> > 
> > After - subjects_collection "attribute_mapped_collection": 
> > 
> 
>  
>
> > for row in students: 
> > row_no += 1 
> > csv_row = [row_no] 
> > csv_row += [row.subjects_collection.get(x.code, '').value for x in 
> ordered_subjects] 
> > csv_row += [row.created_on, row.updated_on] 
> > 
> > writer.writerow([x.encode('utf-8') if type(x) == unicode else x for 
> x in csv_row]) 
> > 
> > 
> > class Subject(db.Model): 
> > __tablename__ = 'subjects' 
> > 
> > student_id = db.Column(db.Integer, db.ForeignKey('students.id'), 
> primary_key=True) 
> > 
> > code = db.Column(db.String(50), primary_key=True) 
> > 
> > value= db.Column(db.String) 
> > 
> > def __init__(self, code , value): 
> > self.code = code 
> > self.value = value 
> > 
> > 
> > class Student(ResourceMixin, db.Model): 
> > __tablename__ = 'students' 
> > 
> > subjects= db.relationship('Subject', backref='student') 
> > 
> > id = db.Column(db.Integer, primary_key=True) 
> > 
> > subjects_collection = relationship("Subject", 
> collection_class=attribute_mapped_collection('code')) 
> > 
> > Can you see a way I can optimize this? Any ideas? 
> > 
> > 
> > On Friday, 3 July 2020 12:31:03 UTC+2, Simon King wrote: 
> >> 
> >> Are you trying to optimise the database access (ie. minimize the 
> >> number of queries), or provide a nice dictionary-style API for your 
> >> Student objects? What do you mean when you say that looping over 
> >> student.subjects is quite heavy? 
> >> 
> >> An association proxy can be used to get dict-style access to a 
> relationship: 
> >> 
> >> 
> https://docs.sqlalchemy.org/en/13/orm/extensions/associationproxy.html#proxying-to-dictionary-based-collections
>  
> >> 
> >> There are also a couple of examples in the SQLAlchemy docs that 
> >> provide a dictionary-style API: 
> >> 
> >> 
> https://docs.sqlalchemy.org/en/13/orm/examples.html#module-examples.dynamic_dict
>  
> >> 
> >> 
> https://docs.sqlalchemy.org/en/13/orm/examples.html#module-examples.vertical 
> >> 
> >> Hope that helps, 
> >> 
> >> Simon 
> >> 
> >> On Thu, Jul 2, 2020 at 8:46 PM Justvuur  wrote: 
> >> > 
> >> > Hi there, 
> >> > 
> >> > I'm struggling to find an efficient way to get a two columned subset 
> into dictionary form. 
> >> > 
> >> > I have an entity that has a subset of data. The subset is linked to 
> the entity via Id. The order of the subset of data is defined in another 
> table. 
> >> > 
> >> > Example: 
> >> > Student - Id, firstname, lastname 
> >> > Subjects - StudentId, SubjectCode, SubjectName 
> >> > 
> >> > At the moment I'm looping through 

[sqlalchemy] SQLAlchemy taking too much time to process the result

2020-07-06 Thread Saylee M.
Hello all, 
Hope you are fine and safe in these times!

I can be easily considered as a novice in SQLAlchemy.
I am trying to pull data from a MYSQL database from a table having around 
20 columns and more than 10 million rows. 
The table is partitioned and indexed. I am using a complex query, having 
UNION of multiple sub-queries, in the format:

with cte1 as , 
cte2 as , 
.. 
select * from cte1
union all
select * from cte2
;

The code demands it to be a dynamic query and number of sub-queries depends 
on the user input. 
For understanding purpose, I took a query having two sub-queries. 
So, when I passed the query to MySQL directly, it took very less time 
(around 0.016 seconds) but when I passed the same 
query through SQLAlchemy connector, it took around 600 seconds.
I tried generating the query using only text clause in one approach and I 
also tried combination of .select(), .where() and .and_ query approach.
However, both were taking a very long time.

I am not sure what can be issue. It'll be great if I can get any pointers 
to reduce the time, preferably under 10 seconds!

Thanks in advance!

Regards,
Saylee

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/f7b03dcd-dd38-4df1-8719-a7902b1a39d4o%40googlegroups.com.


Re: [sqlalchemy] "Virtual models" for JSONB data?

2020-07-06 Thread Pedro Ferreira
OK, thanks a lot anyway! I was just wondering if there was already some
solution I could reuse.

Cheers,

Pedro

On 03.07.20 17:26, Mike Bayer wrote:

> I suppose.  that seems really complicated.    When we use the ORM, we're
> defining our domain model in terms of objects, then we define a
> persistence layer in terms of relational tables.   The Declarative
> approach is a compromise between the more strict mapper() approach,
> which looks like mapper(class, Table) and the need to have the
> declaration of classes and their data members be succinct and
> localized.  however if you are defining an alternative form of
> persistence , then all of the ORM capabilities go away, they'd have to
> be reimplemented in terms of that new persistence model.
> 
> short answer, things like relationships and all that you would need to
> re-implement in some other way that works in terms of json records.
> 

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/e89791b0-5b63-cb40-1c60-3ccb94c5f51b%40cern.ch.


signature.asc
Description: OpenPGP digital signature