On Thu, Jan 27, 2011 at 9:12 AM, Jacob Appelbaum <[email protected]> wrote: [...] > I'd like to run these queries on a live DB. I don't currently have a > machine to load these files where it won't take a century, so I'm going > to punt and see if Seth has any suggestions. If he doesn't, I'll find a > fast machine for some computing...
Cut the files down; take a random sample of a (say) million certificates. This should fit easily into a one-computer DB. Rationale: for the purpose of fingerprint normalization, we don't actually care about answering questions like "does anybody at all do X with their certificates?" We care about an easier question: "can a censor that wants to allow SSL but not Tor afford to block everybody whose certificates look like X?" The version of this that the SSL observatory can answer boils down to something like "Is there a hefty fraction of SSL certificates that look like X?" And *this* is a question that can get answered with a random sample. yrs, -- Nick
