>>1) We've gathered a TB of data from CommonCrawl and we run regression tests >>against this TB (thank you, Rackspace for hosting our vm!) to try to identify >>these problems.
And if anyone with connections at a big company doing open source + cloud would be interested in floating us some storage and cycles, we'd be happy to move off our single vm to increase coverage and improve the speed for our large-scale regression tests. :D But seriously, thank you for this discussion and collaboration! Cheers, Tim