On Fri, Aug 13, 2010 at 4:27 PM, Clint Byrum <[email protected]> wrote: > > On Aug 11, 2010, at 8:34 PM, Robert Collins wrote: >> >> All the times add up substantially, get 100 products in a productset >> and we're doing 300 queries just-like-that. (Launchpad-project, or >> zope?) >> >> Thats roughly, in pseudo code - because I haven't looked at this >> particular cases python yet: >> for product in (some query): >> for distro in product.getDistroSeries(): >> for milestone in distro.getMileStones() > > What happens inside this most inner for loop is actually what is most > interesting to me. > > If this is what happens: > > product_distro_milestones.append(x)
Ah, so I was flattening out a tal template :) The inner loop will be pulling attributes out of the milestone and rendering them. The whole thing in principle can be parallelised but... > Then this can actually be parallelized quite a bit, whether through > async/event driven methods or threads is really more of a preference. > Whether you actually want to bomb the DB server with 2, or 10, or 100 > queries at once is another question entirely. The DB can generate a data set for this sort of thing in 50-100ms, if we get all the rows at once. Doing multiple queries to express a single intent seems a bit unpleasant :) - I'm very interested in why you seem to suggest this is a positive thing? > Essentially this is the classic case for cores/spindles/RAM for scaling, > in that if none of these steps are dependent on one another, it may be > trivial to get them to run all at once. Right - I would express the intent to the DB layer and expect it to parallelise appropriate - e.g. like cassandra. > I'm partial to Gearman for farming work like this out btw. ;) > > www.gearman.org I had been meaning to ask - we have some gearman like needs at the moment, and are installing rabbitMQ. Do you see gearman as a replacement for rabbit, or something that might be complementary? I don't think we'd really want the cognitive overhead of two nearly-identical things. > Of course, another question is why are these loops running queries > instead of building criteria for selects/unions? (Disclaimer: I'm still > not very familiar with Launchpad's data model) That is actually the point of this thread: the ORM usage style we have drives a its-like-python API for talking to the DB, (which we got from sqlobject waay back, but storm also encourages/is designed to deliver this sort of thing). I think that we can get much better results by separating the concerns out. ... Trimmed caching digression - I completely agree with what you're saying, and there are in fact many caveats around what we can do now, and should aim for in the future. > Far more interesting, to me, is to move data like this into scale-out > de-normalized data caches that are simply more oriented around the queries > that give the most pain from a relational standpoint. Sometimes > materialized views make sense for this, other times pushing into key/value > stores works. Sometimes, while seemingly not a "search", pushing complex > queries into a search engine like SOLR or Sphinx works wonderfully for > this. I want to put a group together to do a solr prototype for LP :- if you can get some cycles to help us with that, that would be awesome. We have a number of denormalised caches in LP's data model already - some are pulling their weight, some perhaps not :- but our performance is currently all over the shop, so its a bit hard to tell. > But generally, if you're waiting for a user to ask for a complex view, > that is a lot of work (even with caching) that you could have done as > soon as the data was written (asynchronously w/ a queueing system). Yep - thats really at a different level the performance analysis stack - its something we do already in some places : as Danilo says, we've solved some pretty hairy DB performance things in the past and continue to do so. -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

