On Tue, Mar 13, 2012 at 7:19 AM, David Cantrell <da...@cantrell.org.uk> wrote: > In terms of what the API (or the MongoDB thing) looks like, to cut down > on the traffic, CPANdeps can live with just summaries of > dist/distversion/perlversion/os/state, and doesn't need the report > bodies. Report bodies should probably be available as a separate > object.
That's exactly the way Metabase is designed. "index data" lives in one DB and "bodies" live in another. Currently, that's S3 for bodies and SimpleDB for index data. Step 1 is moving from SimpleDB to MongoDB. Maybe we'll eventually also migrate away from S3, but that's not as big a priority since it's pretty static and cheap. Queries on SimpleDB have been driving up cost to the point where it's insane to stay on it much longer. So setting up slave MongoDB instances would replicate the index data, which is pretty much what you want. (Probably more than you need, but it might be easier/faster than getting the smaller data set.) -- David