Re: [sqlalchemy] convert subset to dictionary

2020-07-07 Thread 'Jonathan Vanasco' via sqlalchemy
Based on what you shared above:

* The "Subject" table is: `StudentId, SubjectCode, SubjectName`
* There are 181 subjects

It looks like you don't have a "Subject" table, but a "StudentAndSubject" 
table.

I think you'd have a bigger performance improvement by normalizing that 
data into two tables:

Subject:  SubjectId (primary key), SubjectCode, SubjectName
Student2Subject: StudentId, SubjectId, (primary key is both)

Assuming this can be done with your data... the database performance should 
improve because

1. The raw filestorage will decrease
2. The in-memory dataset size will decrease

You could then either

1. use the Subject table as part of a joined query to keep things simple, or
2. just select off a join of Student+Student2Subject , and query all the 
Subjects separately.  Even if there are 2000 subjects total, it should only 
take a few ms to get all that into a python datastructure that is used to 
generate your csv


-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/37704b34-346d-4bf5-b0fa-c892b13d4e1eo%40googlegroups.com.


Re: [sqlalchemy] convert subset to dictionary

2020-07-07 Thread Justvuur
Nope, I'm running it on a Xubuntu VM. The VM only has 4 gigs of RAM and 4 
cores assigned to it.
Nope, the students table has created_on, updated_on, id, 3 integer columns 
and a boolean column.
The subjects table has student_id, code and name columns. The code column 
is character varying(100) and the name column is character varying.

Yes, so its not over the net, its running locally.
That's a good idea! Maybe I'll try that!
Thanks for your help on this, I really appreciate it.



On Tuesday, 7 July 2020 15:56:04 UTC+2, Simon King wrote:
>
> 2:25 minutes, even for 1,267,000 rows, sounds like quite a long time 
> to me. Are you physically a long way from the database, querying 
> across the internet? Do the tables contain many columns, or perhaps 
> large BLOB or TEXT fields? 
>
> If there are large fields that you don't need very often, you could 
> consider marking them as "deferred", then they wouldn't be loaded by 
> default: 
>
>
> https://docs.sqlalchemy.org/en/13/orm/loading_columns.html#deferred-column-loading
>  
>
> I suppose if your database server was underpowered, loading over a 
> million rows might exhaust its memory and cause it to start swapping, 
> which would hurt performance, in which case querying for smaller 
> amounts of data might be better. 
>
> Simon 
>
> On Tue, Jul 7, 2020 at 12:53 PM Justvuur > 
> wrote: 
> > 
> > I'm currently testing with 7000 students with 181 subjects. 
> > I first went over to the DB to run the query directly on there. I did an 
> innerjoin between the two tables. 
> > It took about 2:25 minutes. I then selected specific columns from the 
> two tables and the time dropped to about 2 minutes or so. 
> > I then tried a "select subjects where student ID in (...)" query and 
> that executed a lot quicker than the others. 
> > 
> > So then on the python side of things, I first fetched the students and 
> only selected id, created_on and updated_on columns using the with_entities 
> option. 
> > That seemed to take less than 1 second. I then, for each student, 
> executed a DB query to fetch the subjects using the with_entities option. 
> > For some reason, this seems to work faster than using the 
>  eager-loading/students.subjects/attribute_mapped_collection. They seem to 
> take 2 minutes longer than the above (what should be inefficient) approach. 
> > 
> > On Tuesday, 7 July 2020 at 12:50:13 UTC+2 Simon King wrote: 
> >> 
> >> How long is it taking? You mentioned 2000 students with 100 subjects 
> >> each, so there are something like 200,000 rows in the Subjects table, 
> >> and you need to load all of it. I wouldn't expect that to take longer 
> >> than a couple of seconds though. 
> >> 
> >> Simon 
> >> 
> >> On Mon, Jul 6, 2020 at 7:34 PM Justvuur  wrote: 
> >> > 
> >> > I added the eager-loading but it seems to slow down the SQL query 
> quite a lot. 
> >> > It's as if now, the SQL query is taking longer but the generating of 
> the file is quicker hehe... I guess now the queries are being fired before 
> to populate subjects. 
> >> > It's still taking relatively the same amount of time though. 
> >> > 
> >> > 
> >> > On Friday, 3 July 2020 17:07:43 UTC+2, Simon King wrote: 
> >> >> 
> >> >> Are you eager-loading the "student.subjects" relationship? If not, 
> >> >> that will give you the biggest performance increase. Without that, 
> you 
> >> >> will be issuing a separate DB query for each of the students, to 
> load 
> >> >> that student's subjects. Eager-loading allows you to preload the 
> >> >> subjects for every student in a single query: 
> >> >> 
> >> >> 
> https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html#joined-eager-loading
>  
> >> >> 
> >> >> Simon 
> >> >> 
> >> >> On Fri, Jul 3, 2020 at 1:36 PM Justvuur  wrote: 
> >> >> > 
> >> >> > Hi Simon, thanks for the help! I've never used that before, it's 
> quite handy. 
> >> >> > 
> >> >> > I'm looping through all the students and printing them and their 
> subject details to a CSV file. 
> >> >> > What makes things a tad complicated is the subjects must appear in 
> a specific order. 
> >> >> > There is a table that has the subject code and order number 
> (ordered_subjects used below is the resultset from it). 
> >> >> > I printed out the timing and found the problem to be with a nested 
> for loop. 
> >> >> > 
> >> >> > I was hoping to reduce that process time by using a map that 
> automatically gets populated instead of having to create it on the fly. 
> >> >> > 
> >> >> > Before - subjects_collection "attribute_mapped_collection": 
> >> >> > 
> 
>  
>
> >> >> > for row in students: 
> >> >> > row_no += 1 
> >> >> > 
> >> >> > for subject in row.subjects: 
> >> >> > student_subjects[subject.code] = subject.value 
> >> >> > 
> >> >> > csv_row = [row_no] 
> >> >> > csv_row += [student_subjects.get(x.code, '') for x in 
> ordered_subjects] 
> >> >> > csv_row += [row.created_on, 

Re: [sqlalchemy] Attaching a second database to a connection

2020-07-07 Thread Mike Bayer


On Tue, Jul 7, 2020, at 8:11 AM, Richard Damon wrote:
> Ok, I guess I knew you could execute explicit SQL but wasn't thinking
> about it or coming across it in my searches. In my case I wouldn't want
> to automatically connect, as it will be done at a specific time for a
> specific operation, so I could do the ATTACH specifically.

OK, you can do the ATTACH on the database connection from the engine as you get 
it. SQLite uses NullPool in any case so that command would be transitory once 
you close that connection.


> 
> Will I need to explicitly recreate all the schema for the tables? This
> second database will have an identical schema to the main (in fact, it
> will be generated by the program, as this is a way to bring in updates),
> though maybe I won't bother setting up a full ORM model and be using
> more explicit SQL to get the data.

if the database you're attaching already has the tables in it, then that's your 
"schema", i dont see what you would need to "recreate" assuming you mean 
emitting "CREATE TABLE" statements. 

oh, but if you are ATTACHing a blank database and you *want* it to have those 
tables, then yes, you need to emit CREATE TABLE for all of those.

SQLAlchemy isn't doing anything automatic here it just emits the SQL commands 
you tell it to, so at the general level think of this as working with the 
sqlite3 module directly, just that you have a tool to help you write some of 
the commands.





> 
> On 7/7/20 12:46 AM, Mike Bayer wrote:
> >
> >
> > On Mon, Jul 6, 2020, at 11:19 PM, Richard Damon wrote:
> >> SQLite allows a program to attach multiple databases to a single
> >> connection, and you are able to reference tables in these additional
> >> databases with things like schema.table as the name of a table.
> >>
> >> Is there a way to do this in SQLAlchemy?
> >
> > sure, you use SQLite's ATTACH DATABASE command, usually using an event
> > so it occurs for all connections automatically, here is code from our
> > test suite:
> >
> > from sqlalchemy import event
> >
> > engine = create_engine("sqlite://")
> >
> > @event.listens_for(engine, "connect")
> > def connect(dbapi_connection, connection_record):
> > dbapi_connection.execute(
> > 'ATTACH DATABASE "test_schema.db" AS test_schema'
> > )
> >
> > then you reference the attached database as a schema, Table(...,
> > schema="test_schema")
> >
> >>
> >> I am working on an application that will want to import data from
> >> another database (that uses basically the same schema, maybe just a
> >> subset of the schema of the main database), and get the updates needed
> >> to perform by using a join on unique keys (that aren't necessarily the
> >> primary key).
> >>
> >> After finishing the update, and pulling the information in (remapping
> >> rowid/primary keys <-> foreign keys that didn't match between the
> >> databases) I would then detach this database (which ideally I opened as
> >> a read only connection).
> >>
> >> I can see how to establish multiple engines and sessions, but then I
> >> can't do the join between the databases which would let me do a lot of
> >> the work down in the database engine. I also have found being able to
> >> bind different sets of tables into different engines, but in my case the
> >> database will have the same set of tables, so this doesn't look to work.
> >>
> >> -- 
> >> Richard Damon
> >>
> 
> -- 
> Richard Damon
> 
> -- 
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/037e7fb9-e6f1-df4f-6749-2d218d58dd5a%40Damon-Family.org.
> 

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/7b21233c-e909-4ac8-bffd-e29bb3aa5e8e%40www.fastmail.com.


Re: [sqlalchemy] convert subset to dictionary

2020-07-07 Thread Simon King
2:25 minutes, even for 1,267,000 rows, sounds like quite a long time
to me. Are you physically a long way from the database, querying
across the internet? Do the tables contain many columns, or perhaps
large BLOB or TEXT fields?

If there are large fields that you don't need very often, you could
consider marking them as "deferred", then they wouldn't be loaded by
default:

https://docs.sqlalchemy.org/en/13/orm/loading_columns.html#deferred-column-loading

I suppose if your database server was underpowered, loading over a
million rows might exhaust its memory and cause it to start swapping,
which would hurt performance, in which case querying for smaller
amounts of data might be better.

Simon

On Tue, Jul 7, 2020 at 12:53 PM Justvuur  wrote:
>
> I'm currently testing with 7000 students with 181 subjects.
> I first went over to the DB to run the query directly on there. I did an 
> innerjoin between the two tables.
> It took about 2:25 minutes. I then selected specific columns from the two 
> tables and the time dropped to about 2 minutes or so.
> I then tried a "select subjects where student ID in (...)" query and that 
> executed a lot quicker than the others.
>
> So then on the python side of things, I first fetched the students and only 
> selected id, created_on and updated_on columns using the with_entities option.
> That seemed to take less than 1 second. I then, for each student, executed a 
> DB query to fetch the subjects using the with_entities option.
> For some reason, this seems to work faster than using the  
> eager-loading/students.subjects/attribute_mapped_collection. They seem to 
> take 2 minutes longer than the above (what should be inefficient) approach.
>
> On Tuesday, 7 July 2020 at 12:50:13 UTC+2 Simon King wrote:
>>
>> How long is it taking? You mentioned 2000 students with 100 subjects
>> each, so there are something like 200,000 rows in the Subjects table,
>> and you need to load all of it. I wouldn't expect that to take longer
>> than a couple of seconds though.
>>
>> Simon
>>
>> On Mon, Jul 6, 2020 at 7:34 PM Justvuur  wrote:
>> >
>> > I added the eager-loading but it seems to slow down the SQL query quite a 
>> > lot.
>> > It's as if now, the SQL query is taking longer but the generating of the 
>> > file is quicker hehe... I guess now the queries are being fired before to 
>> > populate subjects.
>> > It's still taking relatively the same amount of time though.
>> >
>> >
>> > On Friday, 3 July 2020 17:07:43 UTC+2, Simon King wrote:
>> >>
>> >> Are you eager-loading the "student.subjects" relationship? If not,
>> >> that will give you the biggest performance increase. Without that, you
>> >> will be issuing a separate DB query for each of the students, to load
>> >> that student's subjects. Eager-loading allows you to preload the
>> >> subjects for every student in a single query:
>> >>
>> >> https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html#joined-eager-loading
>> >>
>> >> Simon
>> >>
>> >> On Fri, Jul 3, 2020 at 1:36 PM Justvuur  wrote:
>> >> >
>> >> > Hi Simon, thanks for the help! I've never used that before, it's quite 
>> >> > handy.
>> >> >
>> >> > I'm looping through all the students and printing them and their 
>> >> > subject details to a CSV file.
>> >> > What makes things a tad complicated is the subjects must appear in a 
>> >> > specific order.
>> >> > There is a table that has the subject code and order number 
>> >> > (ordered_subjects used below is the resultset from it).
>> >> > I printed out the timing and found the problem to be with a nested for 
>> >> > loop.
>> >> >
>> >> > I was hoping to reduce that process time by using a map that 
>> >> > automatically gets populated instead of having to create it on the fly.
>> >> >
>> >> > Before - subjects_collection "attribute_mapped_collection":
>> >> > 
>> >> > for row in students:
>> >> > row_no += 1
>> >> >
>> >> > for subject in row.subjects:
>> >> > student_subjects[subject.code] = subject.value
>> >> >
>> >> > csv_row = [row_no]
>> >> > csv_row += [student_subjects.get(x.code, '') for x in ordered_subjects]
>> >> > csv_row += [row.created_on, row.updated_on]
>> >> >
>> >> > writer.writerow([x.encode('utf-8') if type(x) == unicode else x for x 
>> >> > in csv_row])
>> >> >
>> >> >
>> >> > After adding the subjects_collection "attribute_mapped_collection", I 
>> >> > unfortunately did not see a change in performance.
>> >> >
>> >> > After - subjects_collection "attribute_mapped_collection":
>> >> > 
>> >> > for row in students:
>> >> > row_no += 1
>> >> > csv_row = [row_no]
>> >> > csv_row += [row.subjects_collection.get(x.code, '').value for x in 
>> >> > ordered_subjects]
>> >> > csv_row += [row.created_on, row.updated_on]
>> >> >
>> >> > writer.writerow([x.encode('utf-8') if type(x) == unicode else x for x 

Re: [sqlalchemy] "Virtual models" for JSONB data?

2020-07-07 Thread Burak Arslan

On 03/07/2020 16:21, Pedro Ferreira wrote:
> Hi,
>
> I was wondering if there's some way to create some sort of a "virtual"
> model which is not based on an actual table?
>
> Use case: I have a model (e.g. `Room`) which contains a `data` JSONB
> attribute/column which is supposed to store plugin-specific data. I
> would like plugin code to be able to create a "virtual" model which
> references `Room` 1-to-1 and injects a `plugin_data` backref into it.
> That means I would be able to do `room.plugin_data.user`, for instance,
> which would use `room.data['user_id']` to fetch a `User` object.


Spyne contains some work towards that goal.

I added an example[1]. What works: the Preferences object gets
serialized to json and stored as jsonb. It's also deserialized from json
to an instance when read. It's implemented as a generic UserDefinedType[2].

Caveats:

 1. It's postgresql-only (never tested on other RDBMS, maybe it'll work!
help is welcome)
 2. It doesn't do fancy orm stuff like mapping the value of a user_id
field to a user instance.
 3. Doesn't automatically mark stuff dirty as a field in the subobject
changes[3].
 4. Doesn't support marking subfields dirty -- When the subobject is
marked as dirty (using flag_dirty[4]), the whole json document is
updated instead of just the changed subfield. (using jsonb_set[5]).

Disclaimer: I'm the author of Spyne.

I hope it helps!

Best,
Burak

[1]:
https://github.com/arskom/spyne/blob/bc8c3567992abcc387c135d5f8e19c76439c2482/examples/sql/sql_crud.py#L60

[2]:
https://github.com/arskom/spyne/blob/bc8c3567992abcc387c135d5f8e19c76439c2482/spyne/store/relational/document.py#L187

[3]:
https://github.com/arskom/spyne/blob/bc8c3567992abcc387c135d5f8e19c76439c2482/spyne/test/test_sqlalchemy.py#L148

[4]:
https://docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.attributes.flag_dirt

[5]:
https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-JSON-PROCESSING-TABLE

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/82297adf-16ff-adea-e62b-e0b6f7d3ebb6%40burakarslan.com.


Re: [sqlalchemy] Attaching a second database to a connection

2020-07-07 Thread Richard Damon
Ok, I guess I knew you could execute explicit SQL but wasn't thinking
about it or coming across it in my searches. In my case I wouldn't want
to automatically connect, as it will be done at a specific time for a
specific operation, so I could do the ATTACH specifically.

Will I need to explicitly recreate all the schema for the tables? This
second database will have an identical schema to the main (in fact, it
will be generated by the program, as this is a way to bring in updates),
though maybe I won't bother setting up a full ORM model and be using
more explicit SQL to get the data.

On 7/7/20 12:46 AM, Mike Bayer wrote:
>
>
> On Mon, Jul 6, 2020, at 11:19 PM, Richard Damon wrote:
>> SQLite allows a program to attach multiple databases to a single
>> connection, and you are able to reference tables in these additional
>> databases with things like schema.table as the name of a table.
>>
>> Is there a way to do this in SQLAlchemy?
>
> sure, you use SQLite's ATTACH DATABASE command, usually using an event
> so it occurs for all connections automatically, here is code from our
> test suite:
>
>     from sqlalchemy import event
>
>     engine = create_engine("sqlite://")
>
>     @event.listens_for(engine, "connect")
>     def connect(dbapi_connection, connection_record):
>     dbapi_connection.execute(
>     'ATTACH DATABASE "test_schema.db" AS test_schema'
>     )
>
> then you reference the attached database as a schema, Table(...,
> schema="test_schema")
>
>>
>> I am working on an application that will want to import data from
>> another database (that uses basically the same schema, maybe just a
>> subset of the schema of the main database), and get the updates needed
>> to perform by using a join on unique keys (that aren't necessarily the
>> primary key).
>>
>> After finishing the update, and pulling the information in (remapping
>> rowid/primary keys <-> foreign keys that didn't match between the
>> databases) I would then detach this database (which ideally I opened as
>> a read only connection).
>>
>> I can see how to establish multiple engines and sessions, but then I
>> can't do the join between the databases which would let me do a lot of
>> the work down in the database engine. I also have found being able to
>> bind different sets of tables into different engines, but in my case the
>> database will have the same set of tables, so this doesn't look to work.
>>
>> -- 
>> Richard Damon
>>

-- 
Richard Damon

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/037e7fb9-e6f1-df4f-6749-2d218d58dd5a%40Damon-Family.org.


Re: [sqlalchemy] convert subset to dictionary

2020-07-07 Thread Justvuur
I'm currently testing with 7000 students with 181 subjects.
I first went over to the DB to run the query directly on there. I did an 
innerjoin between the two tables.
It took about 2:25 minutes. I then selected specific columns from the two 
tables and the time dropped to about 2 minutes or so.
I then tried a "select subjects where student ID in (...)" query and that 
executed a lot quicker than the others.

So then on the python side of things, I first fetched the students and only 
selected id, created_on and updated_on columns using the with_entities 
option.
That seemed to take less than 1 second. I then, for each student, executed 
a DB query to fetch the subjects using the with_entities option.
For some reason, this seems to work faster than using the  
eager-loading/students.subjects/attribute_mapped_collection. They seem to 
take 2 minutes longer than the above (what should be inefficient) approach.

On Tuesday, 7 July 2020 at 12:50:13 UTC+2 Simon King wrote:

> How long is it taking? You mentioned 2000 students with 100 subjects
> each, so there are something like 200,000 rows in the Subjects table,
> and you need to load all of it. I wouldn't expect that to take longer
> than a couple of seconds though.
>
> Simon
>
> On Mon, Jul 6, 2020 at 7:34 PM Justvuur  wrote:
> >
> > I added the eager-loading but it seems to slow down the SQL query quite 
> a lot.
> > It's as if now, the SQL query is taking longer but the generating of the 
> file is quicker hehe... I guess now the queries are being fired before to 
> populate subjects.
> > It's still taking relatively the same amount of time though.
> >
> >
> > On Friday, 3 July 2020 17:07:43 UTC+2, Simon King wrote:
> >>
> >> Are you eager-loading the "student.subjects" relationship? If not,
> >> that will give you the biggest performance increase. Without that, you
> >> will be issuing a separate DB query for each of the students, to load
> >> that student's subjects. Eager-loading allows you to preload the
> >> subjects for every student in a single query:
> >>
> >> 
> https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html#joined-eager-loading
> >>
> >> Simon
> >>
> >> On Fri, Jul 3, 2020 at 1:36 PM Justvuur  wrote:
> >> >
> >> > Hi Simon, thanks for the help! I've never used that before, it's 
> quite handy.
> >> >
> >> > I'm looping through all the students and printing them and their 
> subject details to a CSV file.
> >> > What makes things a tad complicated is the subjects must appear in a 
> specific order.
> >> > There is a table that has the subject code and order number 
> (ordered_subjects used below is the resultset from it).
> >> > I printed out the timing and found the problem to be with a nested 
> for loop.
> >> >
> >> > I was hoping to reduce that process time by using a map that 
> automatically gets populated instead of having to create it on the fly.
> >> >
> >> > Before - subjects_collection "attribute_mapped_collection":
> >> > 
> 
> >> > for row in students:
> >> > row_no += 1
> >> >
> >> > for subject in row.subjects:
> >> > student_subjects[subject.code] = subject.value
> >> >
> >> > csv_row = [row_no]
> >> > csv_row += [student_subjects.get(x.code, '') for x in 
> ordered_subjects]
> >> > csv_row += [row.created_on, row.updated_on]
> >> >
> >> > writer.writerow([x.encode('utf-8') if type(x) == unicode else x for x 
> in csv_row])
> >> >
> >> >
> >> > After adding the subjects_collection "attribute_mapped_collection", I 
> unfortunately did not see a change in performance.
> >> >
> >> > After - subjects_collection "attribute_mapped_collection":
> >> > 
> 
> >> > for row in students:
> >> > row_no += 1
> >> > csv_row = [row_no]
> >> > csv_row += [row.subjects_collection.get(x.code, '').value for x in 
> ordered_subjects]
> >> > csv_row += [row.created_on, row.updated_on]
> >> >
> >> > writer.writerow([x.encode('utf-8') if type(x) == unicode else x for x 
> in csv_row])
> >> >
> >> >
> >> > class Subject(db.Model):
> >> > __tablename__ = 'subjects'
> >> >
> >> > student_id = db.Column(db.Integer, db.ForeignKey('students.id'), 
> primary_key=True)
> >> >
> >> > code = db.Column(db.String(50), primary_key=True)
> >> >
> >> > value= db.Column(db.String)
> >> >
> >> > def __init__(self, code , value):
> >> > self.code = code
> >> > self.value = value
> >> >
> >> >
> >> > class Student(ResourceMixin, db.Model):
> >> > __tablename__ = 'students'
> >> >
> >> > subjects= db.relationship('Subject', backref='student')
> >> >
> >> > id = db.Column(db.Integer, primary_key=True)
> >> >
> >> > subjects_collection = relationship("Subject", 
> collection_class=attribute_mapped_collection('code'))
> >> >
> >> > Can you see a way I can optimize this? Any ideas?
> >> >
> >> >
> >> > On Friday, 3 July 2020 12:31:03 UTC+2, Simon King wrote:
> >> >>
> >> >> Are 

Re: [sqlalchemy] convert subset to dictionary

2020-07-07 Thread Simon King
How long is it taking? You mentioned 2000 students with 100 subjects
each, so there are something like 200,000 rows in the Subjects table,
and you need to load all of it. I wouldn't expect that to take longer
than a couple of seconds though.

Simon

On Mon, Jul 6, 2020 at 7:34 PM Justvuur  wrote:
>
> I added the eager-loading but it seems to slow down the SQL query quite a lot.
> It's as if now, the SQL query is taking longer but the generating of the file 
> is quicker hehe... I guess now the queries are being fired before to populate 
> subjects.
> It's still taking relatively the same amount of time though.
>
>
> On Friday, 3 July 2020 17:07:43 UTC+2, Simon King wrote:
>>
>> Are you eager-loading the "student.subjects" relationship? If not,
>> that will give you the biggest performance increase. Without that, you
>> will be issuing a separate DB query for each of the students, to load
>> that student's subjects. Eager-loading allows you to preload the
>> subjects for every student in a single query:
>>
>> https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html#joined-eager-loading
>>
>> Simon
>>
>> On Fri, Jul 3, 2020 at 1:36 PM Justvuur  wrote:
>> >
>> > Hi Simon, thanks for the help! I've never used that before, it's quite 
>> > handy.
>> >
>> > I'm looping through all the students and printing them and their subject 
>> > details to a CSV file.
>> > What makes things a tad complicated is the subjects must appear in a 
>> > specific order.
>> > There is a table that has the subject code and order number 
>> > (ordered_subjects used below is the resultset from it).
>> > I printed out the timing and found the problem to be with a nested for 
>> > loop.
>> >
>> > I was hoping to reduce that process time by using a map that automatically 
>> > gets populated instead of having to create it on the fly.
>> >
>> > Before - subjects_collection "attribute_mapped_collection":
>> > 
>> > for row in students:
>> > row_no += 1
>> >
>> > for subject in row.subjects:
>> > student_subjects[subject.code] = subject.value
>> >
>> > csv_row = [row_no]
>> > csv_row += [student_subjects.get(x.code, '') for x in ordered_subjects]
>> > csv_row += [row.created_on, row.updated_on]
>> >
>> > writer.writerow([x.encode('utf-8') if type(x) == unicode else x for x 
>> > in csv_row])
>> >
>> >
>> > After adding the subjects_collection "attribute_mapped_collection", I 
>> > unfortunately did not see a change in performance.
>> >
>> > After - subjects_collection "attribute_mapped_collection":
>> > 
>> > for row in students:
>> > row_no += 1
>> > csv_row = [row_no]
>> > csv_row += [row.subjects_collection.get(x.code, '').value for x in 
>> > ordered_subjects]
>> > csv_row += [row.created_on, row.updated_on]
>> >
>> > writer.writerow([x.encode('utf-8') if type(x) == unicode else x for x 
>> > in csv_row])
>> >
>> >
>> > class Subject(db.Model):
>> > __tablename__ = 'subjects'
>> >
>> > student_id = db.Column(db.Integer, db.ForeignKey('students.id'), 
>> > primary_key=True)
>> >
>> > code = db.Column(db.String(50), primary_key=True)
>> >
>> > value= db.Column(db.String)
>> >
>> > def __init__(self, code , value):
>> > self.code = code
>> > self.value = value
>> >
>> >
>> > class Student(ResourceMixin, db.Model):
>> > __tablename__ = 'students'
>> >
>> > subjects= db.relationship('Subject', backref='student')
>> >
>> > id = db.Column(db.Integer, primary_key=True)
>> >
>> > subjects_collection = relationship("Subject", 
>> > collection_class=attribute_mapped_collection('code'))
>> >
>> > Can you see a way I can optimize this? Any ideas?
>> >
>> >
>> > On Friday, 3 July 2020 12:31:03 UTC+2, Simon King wrote:
>> >>
>> >> Are you trying to optimise the database access (ie. minimize the
>> >> number of queries), or provide a nice dictionary-style API for your
>> >> Student objects? What do you mean when you say that looping over
>> >> student.subjects is quite heavy?
>> >>
>> >> An association proxy can be used to get dict-style access to a 
>> >> relationship:
>> >>
>> >> 
>> >> https://docs.sqlalchemy.org/en/13/orm/extensions/associationproxy.html#proxying-to-dictionary-based-collections
>> >>
>> >> There are also a couple of examples in the SQLAlchemy docs that
>> >> provide a dictionary-style API:
>> >>
>> >> 
>> >> https://docs.sqlalchemy.org/en/13/orm/examples.html#module-examples.dynamic_dict
>> >>
>> >> 
>> >> https://docs.sqlalchemy.org/en/13/orm/examples.html#module-examples.vertical
>> >>
>> >> Hope that helps,
>> >>
>> >> Simon
>> >>
>> >> On Thu, Jul 2, 2020 at 8:46 PM Justvuur  wrote:
>> >> >
>> >> > Hi there,
>> >> >
>> >> > I'm struggling to find an efficient way to get a two columned subset 
>> >> > into dictionary form.
>>