They'd all be identical DBs/schemas. And if we updated schema on one, we'd update all. We're talking like 100-200 operators total (they are airlines). They just have TONS of data each.
So one of my questions that I was hoping to hear answered (and maybe I missed it) is this: Are mySQL reports going to be faster in tables that are exclusive to a single operator? In other words, if we have 100 operators data in a single table (let's say millions of rows PER operator), are those queries any slower (assuming proper indexes and an operator ID and all that business). Or is the nature of B-Trees such that it shouldn't matter either way? Logic would dictate that the RDBMS has to sift through all the records that are NOT for a given operator (of which 99% of the data would not be, correct?), so that seems like the hard drive has to jump over gobs and blobs of INT/VARCHAR/TEXT/etc. that it doesn't need. OR, again, is the filesystem/RDMBS/innodb file structured in such a way that the index knows which inode to pop to instantly for each row, so irrelevant rows are negligible hits to seek/read times? And related, are there any limitations to the number of databases we can use in a join for internal reports (or perhaps the length of a single SQL statement). Obviously we would construct a query in PHP or Python or something to assemble all the operator 1 ... 200 database names, so it's not like we'd hand-craft that. But it could get a fairly lengthy string/SQL/query to pass off to mySQL. For a simple example, perhaps we want to know how many 'foo' there are in a given table per Operator, and then a total of all 'foo'. (and yes this contrived example could be done in a loop and tally, but you get the concept). -----Original Message----- From: Arthur Fuller <fuller.art...@gmail.com> To: Daevid Vincent <dae...@daevid.com> Cc: mysql <mysql@lists.mysql.com> Subject: Re: Separate customer databases vs all in one Date: Tue, 17 Mar 2009 21:17:00 -0400 Are these databases identical or merely similar? If they are structurally identical, I'd go for one database per customer. Then you have isolation, easy structure updates and above all, consistent front-end code, in whatever language that occurs. Just obtain the customer ID and then use the appropriate database. Everything else can remain the same. The only fly in the ointment concerns whether you'd ever have the customer need to cross databases. I would imagine that sort of thing is for internal use, not the customers. In that case, the performance hit if any won't impact upon the customer, just you. hth, Arthur On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent <dae...@daevid.com> wrote: > I'm writing a report tool wherein we have many customers who subscribe > to this SaaS. There are millions of rows of data per customer. All > customers are islands from each other (of course). > > Are there any major issues or benefits between storing each customer in > their own database (with their own tables), or all lumped into a single > database? > > At first thought, it seems that by separating them, queries should be > faster no (as there is less data to sift though per customer)? It of > course makes upgrading table schema a wee bit more cumbersome, but a > simple loop and script can handle that easily enough. And since you can > query across databases, we can still make internal aggregate reports for > our own usage. > > For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or > we can use UNIONS etc. too. > > Consolidating them into one would seem to bloat the tables and slow > things down (or is the fact that mySQL uses B-Trees invalidate that > theory)? It also makes us have to have a customer_id entry in every > table basically (or some FK to distinguish who's data is who's). It also > feels like it could leak data if a malformed query were to get through, > although I'm not terribly worried about this as we do some heavy UAT > before pushing from DEV to TEST to PROD. > > Performance is a major factor concern here given our huge data sets > involved. Does joining across databases impose any speed/performance > hits vs. just joining across tables within a single database? > > http://daevid.com >