On Sun, Jun 28, 2009 at 6:31 PM, Ian Jamieson<iajamie...@gmail.com> wrote:
> I have a similar question, but what I'm interested in is how to
> deal with a large number of records?
> In the previous example  if there are thousands of users
> and paginate ended up with something like 200 pages.
> Suppose one day I had to do some sort of online report,
> how do I try not use all the server resources?

That's basic SQL planning.  Keep all the hard work inside the database
engine, and fetch as little data into Python as necessary.  Add
indexes for columns used in 'where' or 'order by' clauses.  Take
advantage of aggregate functions and 'group by'.  Use 'limit' and
'offset' if you're only interested in a subset of results (this is
what Paginate does).  Both MySQL and Postgres have extensive
documetation on how to optimize queries; look for the 'EXPLAIN' and
'ANALYZE' commands.

If your table has several large text fields that you use only
occasionally, put them in a 'defer' group in the mapper, so they will
only be fetched when you tell it to.

# Model
orm.mapper(Entry, t_entry, properties={
    "content": orm.deferred(ec.content, group="details"),
    },
    order_by=[ec.entry_date.desc()])

# Controller code, or class method in model
q = meta.Session.query(Entry)
q = q.options(orm.undefer_group("details"))

Here I defer all columns except those used in searches and index
pages.  The main query will ignore them, but SQLAlchemy will fetch
them just-in-time if I access them.  But I know which pages I need
them for -- the details page and the modify form -- and those happen
to be the pages where I fetch only one record.  So I undefer the group
on those pages, and SQLAlchemy includes them in the main query.

Sometimes it's more efficient to fetch slightly more than you need, if
you can leverage an index, and each record is short.  Say you have to
do some complex search or reporting that's cumbersome to translate to
SQL.  Fetch a single column and the primary key, using 'where' to
knock off any chunks that are easy to specify.  Then do your final
calculation in Python.  This is also useful if you have to do two
different calculations on the same column; maybe in different subsets
of records.  You can make one inclusive query that fetches the records
for both, and then do both calculations in parallel in Python.

When you have to do ad hoc calculations over a large number of
records, and you only need one or two columns, make a SQL builder
query rather than an ORM query.  It's several times faster.

You can also precalculate report statistics at night and save them in
cache tables.  So if your report shows monthly results, you can put
the intermediate results in a Monthly table, and then the report will
come up faster on demand.

-- 
Mike Orr <sluggos...@gmail.com>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to 
pylons-discuss+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to