I'm writing a report tool wherein we have many customers who subscribe to this SaaS. There are millions of rows of data per customer. All customers are islands from each other (of course).
Are there any major issues or benefits between storing each customer in their own database (with their own tables), or all lumped into a single database? At first thought, it seems that by separating them, queries should be faster no (as there is less data to sift though per customer)? It of course makes upgrading table schema a wee bit more cumbersome, but a simple loop and script can handle that easily enough. And since you can query across databases, we can still make internal aggregate reports for our own usage. For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or we can use UNIONS etc. too. Consolidating them into one would seem to bloat the tables and slow things down (or is the fact that mySQL uses B-Trees invalidate that theory)? It also makes us have to have a customer_id entry in every table basically (or some FK to distinguish who's data is who's). It also feels like it could leak data if a malformed query were to get through, although I'm not terribly worried about this as we do some heavy UAT before pushing from DEV to TEST to PROD. Performance is a major factor concern here given our huge data sets involved. Does joining across databases impose any speed/performance hits vs. just joining across tables within a single database? http://daevid.com