On Aug 11, 2010, at 8:34 PM, Robert Collins wrote: > > All the times add up substantially, get 100 products in a productset > and we're doing 300 queries just-like-that. (Launchpad-project, or > zope?) > > Thats roughly, in pseudo code - because I haven't looked at this > particular cases python yet: > for product in (some query): > for distro in product.getDistroSeries(): > for milestone in distro.getMileStones()
What happens inside this most inner for loop is actually what is most interesting to me. If this is what happens: product_distro_milestones.append(x) Then this can actually be parallelized quite a bit, whether through async/event driven methods or threads is really more of a preference. Whether you actually want to bomb the DB server with 2, or 10, or 100 queries at once is another question entirely. Essentially this is the classic case for cores/spindles/RAM for scaling, in that if none of these steps are dependent on one another, it may be trivial to get them to run all at once. I'm partial to Gearman for farming work like this out btw. ;) www.gearman.org Of course, another question is why are these loops running queries instead of building criteria for selects/unions? (Disclaimer: I'm still not very familiar with Launchpad's data model) > > How can we cache things today? What are the options? > I think caching at a low level, while a good idea, can often carry such a high technical debt, that its not really worth it. In most scenarios with caching, the obvious things come first. Common views that don't need to be up to date in real time, very expensive operations where any bit of popularity can cause a site outage, etc. But once you've done that, the more complex performance problems show up on your radar, and you're left with a dilemma. You've gotten really good at caching simple data access patterns, and it has garnered a huge gain in performance. But doing it for complex data structures does not scale at the same rate. It always jumps to mind that you can just cache/invalidate in the data model. Even if the ORM and associated tools are incredible, doing this is, as you suggest, non-trivial, and the amount of code written vs. the actual gain in performance is usually is usually a disappointment. Meanwhile, miss one little area for invalidation/recalculation and you get weird, hard to reproduce issues. Far more interesting, to me, is to move data like this into scale-out de-normalized data caches that are simply more oriented around the queries that give the most pain from a relational standpoint. Sometimes materialized views make sense for this, other times pushing into key/value stores works. Sometimes, while seemingly not a "search", pushing complex queries into a search engine like SOLR or Sphinx works wonderfully for this. But generally, if you're waiting for a user to ask for a complex view, that is a lot of work (even with caching) that you could have done as soon as the data was written (asynchronously w/ a queueing system). _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

