On Wed, Feb 03, 2010 at 10:40:12AM -0800, Ihnen, David wrote: > It takes four hours to run the query? I have written queries > affecting grouping and sorting 6 billion rows that ran several > orders of magnitude faster than that.
Well, I'm pretty good at writing sql queries, but I've never claimed to be an expert. And efficiently querying this rather large data set is something new to me. > > I suspect that you need to consider appropriate indexes to support your where > clauses? > I've indexed the primary table, and have created a view in the db and have verified that the view is properly using the index. However, I need to drill down into the postgresql docs to pick up more hints on how to devise useful partial indexes (I can't see how they can be applied yet) or a better index over my search expression. > If you truly working with data of that billion-record magnitude, > your heap is too big to query this way. You'll need to take an Ah. I have about 3.5 billion records now, holding _most_ of 2007. This is expected to grow as I add 08,09,... > alternative approach such as denormalizing the data store into > derived data closer to the final form that will be needed, then > maintain the denormalized data to be in sync. (That denormalization > could take the form of creating a smaller summary table, which could > be queried with DBIx::Class - I did that for statistics reporting > before. Though I could query the stats directly, it was very > expensive.) I guess I *am* trying to denormalize the data right now. I'm trying to get the data out of psql and into CouchDB, which I've found to be extremely fast for 'front end' queries I need to do after I set up the proper view. My goal for this perl/sql step is to summarize the raw data into 5 minute aggregates. I think your advice is good. My sql view is excessively complicated and is probably the root of the problem. I spent the morning looking at statement caching and so on, and I think that's a useless optimization, so I'll stop wondering whether the problem is sth caching and start concentrating on helping psql do a better job. I will take your suggestion and break the problem down into parts and maybe do more work in perl and less in sql. Still, given how offset and limit work (according to the psql planner output), it irritates me that I have to do a monster select only for a subset of rows, then go back and do the exact same select again and again for different subsets. Thanks for the comments, James > > David > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class IRC: irc.perl.org#dbix-class SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/ Searchable Archive: http://www.grokbase.com/group/dbix-class@lists.scsys.co.uk