On Sun, Jun 28, 2009 at 6:31 PM, Ian Jamieson<iajamie...@gmail.com> wrote: > I have a similar question, but what I'm interested in is how to > deal with a large number of records? > In the previous example if there are thousands of users > and paginate ended up with something like 200 pages. > Suppose one day I had to do some sort of online report, > how do I try not use all the server resources?
That's basic SQL planning. Keep all the hard work inside the database engine, and fetch as little data into Python as necessary. Add indexes for columns used in 'where' or 'order by' clauses. Take advantage of aggregate functions and 'group by'. Use 'limit' and 'offset' if you're only interested in a subset of results (this is what Paginate does). Both MySQL and Postgres have extensive documetation on how to optimize queries; look for the 'EXPLAIN' and 'ANALYZE' commands. If your table has several large text fields that you use only occasionally, put them in a 'defer' group in the mapper, so they will only be fetched when you tell it to. # Model orm.mapper(Entry, t_entry, properties={ "content": orm.deferred(ec.content, group="details"), }, order_by=[ec.entry_date.desc()]) # Controller code, or class method in model q = meta.Session.query(Entry) q = q.options(orm.undefer_group("details")) Here I defer all columns except those used in searches and index pages. The main query will ignore them, but SQLAlchemy will fetch them just-in-time if I access them. But I know which pages I need them for -- the details page and the modify form -- and those happen to be the pages where I fetch only one record. So I undefer the group on those pages, and SQLAlchemy includes them in the main query. Sometimes it's more efficient to fetch slightly more than you need, if you can leverage an index, and each record is short. Say you have to do some complex search or reporting that's cumbersome to translate to SQL. Fetch a single column and the primary key, using 'where' to knock off any chunks that are easy to specify. Then do your final calculation in Python. This is also useful if you have to do two different calculations on the same column; maybe in different subsets of records. You can make one inclusive query that fetches the records for both, and then do both calculations in parallel in Python. When you have to do ad hoc calculations over a large number of records, and you only need one or two columns, make a SQL builder query rather than an ORM query. It's several times faster. You can also precalculate report statistics at night and save them in cache tables. So if your report shows monthly results, you can put the intermediate results in a Monthly table, and then the report will come up faster on demand. -- Mike Orr <sluggos...@gmail.com> --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to pylons-discuss+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~----------~----~----~----~------~----~------~--~---