I'm writing a report tool wherein we have many customers who subscribe
to this SaaS. There are millions of rows of data per customer. All
customers are islands from each other (of course).

Are there any major issues or benefits between storing each customer in
their own database (with their own tables), or all lumped into a single
database?

At first thought, it seems that by separating them, queries should be
faster no (as there is less data to sift though per customer)? It of
course makes upgrading table schema a wee bit more cumbersome, but a
simple loop and script can handle that easily enough. And since you can
query across databases, we can still make internal aggregate reports for
our own usage.

For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or
we can use UNIONS etc. too.

Consolidating them into one would seem to bloat the tables and slow
things down (or is the fact that mySQL uses B-Trees invalidate that
theory)? It also makes us have to have a customer_id entry in every
table basically (or some FK to distinguish who's data is who's). It also
feels like it could leak data if a malformed query were to get through,
although I'm not terribly worried about this as we do some heavy UAT
before pushing from DEV to TEST to PROD.

Performance is a major factor concern here given our huge data sets
involved. Does joining across databases impose any speed/performance
hits vs. just joining across tables within a single database?

http://daevid.com

Reply via email to