Re: Separate customer databases vs all in one
You only confirm my first-blush opinion: keep separate databases. Regarding your second issue (joining across databases for reports), I would suggest that these operations are best done against an OLAP database not the numerous OLTP databases. That is, create an additional database whose sole purpose is reporting, and which accumulates all the data from the numerous OLTP databases, on some scheduled basis (schedule dictated by stakeholders -- how recent should the reports be? Will one week do? One day? One hour? The basic idea here is that reporting does aggregates and therefore necessarily does table scans, especially in your case. To place this burden on the OLTP databases is an error in design and more important, a bottleneck in performance. My advice would be to separate the reporting tasks from the data-entry tasks. Aggregate the data periodically in the OLAP database and base all your reports on this, not on the OLTP databases. This way you maximize data-entry and update speed, while also maximizing the reporting speed (since it won't cause contention with the data-entry activities). This would mean that the aggregate db is very large, but OTOH interrogating it won't impair the OLTP databases in the slightest. So the big problem this scenario suggests is the granularity of the updates to the OLAP version of the data. That's not for me to decide. Ask the stakeholders how recent the data must be and proceed from there. Arthur
Re: Separate customer databases vs all in one
I have an application and database doing this right now. We run both ways, a single db per customer and a single customer per db. Smaller customers are in a shared enviroment and larger customers have their own db, sometimes they even have their own server. We use a company_id field in each table to identify the data. On 3/17/09, Arthur Fuller fuller.art...@gmail.com wrote: Are these databases identical or merely similar? If they are structurally identical, I'd go for one database per customer. Then you have isolation, easy structure updates and above all, consistent front-end code, in whatever language that occurs. Just obtain the customer ID and then use the appropriate database. Everything else can remain the same. The only fly in the ointment concerns whether you'd ever have the customer need to cross databases. I would imagine that sort of thing is for internal use, not the customers. In that case, the performance hit if any won't impact upon the customer, just you. hth, Arthur On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent dae...@daevid.com wrote: I'm writing a report tool wherein we have many customers who subscribe to this SaaS. There are millions of rows of data per customer. All customers are islands from each other (of course). Are there any major issues or benefits between storing each customer in their own database (with their own tables), or all lumped into a single database? At first thought, it seems that by separating them, queries should be faster no (as there is less data to sift though per customer)? It of course makes upgrading table schema a wee bit more cumbersome, but a simple loop and script can handle that easily enough. And since you can query across databases, we can still make internal aggregate reports for our own usage. For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or we can use UNIONS etc. too. Consolidating them into one would seem to bloat the tables and slow things down (or is the fact that mySQL uses B-Trees invalidate that theory)? It also makes us have to have a customer_id entry in every table basically (or some FK to distinguish who's data is who's). It also feels like it could leak data if a malformed query were to get through, although I'm not terribly worried about this as we do some heavy UAT before pushing from DEV to TEST to PROD. Performance is a major factor concern here given our huge data sets involved. Does joining across databases impose any speed/performance hits vs. just joining across tables within a single database? http://daevid.com -- - Johnny Withers 601.209.4985 joh...@pixelated.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Separate customer databases vs all in one
On Mar 18, 2009, at 11:47 AM, Johnny Withers wrote: I have an application and database doing this right now. We run both ways, a single db per customer and a single customer per db. Smaller customers are in a shared enviroment and larger customers have their own db, sometimes they even have their own server. We use a company_id field in each table to identify the data. How do you go about table changes and such - run a script that connects to each DB or something more fancy than that? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Re: Separate customer databases vs all in one
On Wed, March 18, 2009 06:29, Johnny Withers wrote: Everytime we make a schema change in development we save an SQL script into a directory named in MMDD format. The scripts are named in MMDD_NNN_DESCRIPTION.SQL format, where NNN is the sequence number. This I realize its probably a bit late for this, but wouldn't MMDD... make more sense? That way they would sort in proper order. allows us to string the files together in the proper order to apply when an update goes out. We simply look at the last update date, combine all the scripts (either using cmd line utilties or an app we wrote to do it) and apply this combined script to all the databases. The application to all databases is currently handled by hand each time since we don't more than once a week and there are not a lot of databases (15 to 20 right now). I can see us needing an automated way to apply these scripts in the future and it'll probably involve a custom application that makes use of a settings file that has the connection string for each database. It'd simply loop these connections and apply the scripts over a date range. We don't have a need for being able to query multiple databases at a time. Since some of these databases reside on different servers, I don't even think we could do that -- unless we started using an engine that supports this. -jw On Wed, Mar 18, 2009 at 7:01 AM, Stephen Sunderlin vze80...@verizon.netwrote: How do do you synchronize alter schema across the databases and what method do you use for internal reporting aggregate across the databases? Mar 18, 2009 06:47:52 AM, joh...@pixelated.net wrote: I have an application and database doing this right now. We run both ways, a single db per customer and a single customer per db. Smaller customers are in a shared enviroment and larger customers have their own db, sometimes they even have their own server. We use a company_id field in each table to identify the data. On 3/17/09, Arthur Fuller fuller.art...@gmail.com wrote: Are these databases identical or merely similar? If they are structurally identical, I'd go for one database per customer. Then you have isolation, easy structure updates and above all, consistent front-end code, in whatever language that occurs. Just obtain the customer ID and then use the appropriate database. Everything else can remain the same. The only fly in the ointment concerns whether you'd ever have the customer need to cross databases. I would imagine that sort of thing is for internal use, not the customers. In that case, the performance hit if any won't impact upon the customer, just you. hth, Arthur On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent dae...@daevid.com wrote: I'm writing a report tool wherein we have many customers who subscribe to this SaaS. There are millions of rows of data per customer. All customers are islands from each other (of course). Are there any major issues or benefits between storing each customer in their own database (with their own tables), or all lumped into a single database? At first thought, it seems that by separating them, queries should be faster no (as there is less data to sift though per customer)? It of course makes upgrading table schema a wee bit more cumbersome, but a simple loop and script can handle that easily enough. And since you can query across databases, we can still make internal aggregate reports for our own usage. For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or we can use UNIONS etc. too. Consolidating them into one would seem to bloat the tables and slow things down (or is the fact that mySQL uses B-Trees invalidate that theory)? It also makes us have to have a customer_id entry in every table basically (or some FK to distinguish who's data is who's). It also feels like it could leak data if a malformed query were to get through, although I'm not terribly worried about this as we do some heavy UAT before pushing from DEV to TEST to PROD. Performance is a major factor concern here given our huge data sets involved. Does joining across databases impose any speed/performance hits vs. just joining across tables within a single database? http://daevid.com -- - Johnny Withers 601.209.4985 joh...@pixelated.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=stephen.sunder...@verizon.net -- - Johnny Withers 601.209.4985 joh...@pixelated.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Re: Separate customer databases vs all in one
I suppose it would... but we have not (and hopefully never will) have to release more than 12 months worth of upates at a time. If we do, can I contact you to get all these files renamed??? I can see it causing a problem at hte end of each year though. Dang. -jw On Wed, Mar 18, 2009 at 11:22 AM, Wm Mussatto mussa...@csz.com wrote: On Wed, March 18, 2009 06:29, Johnny Withers wrote: Everytime we make a schema change in development we save an SQL script into a directory named in MMDD format. The scripts are named in MMDD_NNN_DESCRIPTION.SQL format, where NNN is the sequence number. This I realize its probably a bit late for this, but wouldn't MMDD... make more sense? That way they would sort in proper order. allows us to string the files together in the proper order to apply when an update goes out. We simply look at the last update date, combine all the scripts (either using cmd line utilties or an app we wrote to do it) and apply this combined script to all the databases. The application to all databases is currently handled by hand each time since we don't more than once a week and there are not a lot of databases (15 to 20 right now). I can see us needing an automated way to apply these scripts in the future and it'll probably involve a custom application that makes use of a settings file that has the connection string for each database. It'd simply loop these connections and apply the scripts over a date range. We don't have a need for being able to query multiple databases at a time. Since some of these databases reside on different servers, I don't even think we could do that -- unless we started using an engine that supports this. -jw On Wed, Mar 18, 2009 at 7:01 AM, Stephen Sunderlin vze80...@verizon.netwrote: How do do you synchronize alter schema across the databases and what method do you use for internal reporting aggregate across the databases? Mar 18, 2009 06:47:52 AM, joh...@pixelated.net wrote: I have an application and database doing this right now. We run both ways, a single db per customer and a single customer per db. Smaller customers are in a shared enviroment and larger customers have their own db, sometimes they even have their own server. We use a company_id field in each table to identify the data. On 3/17/09, Arthur Fuller fuller.art...@gmail.com wrote: Are these databases identical or merely similar? If they are structurally identical, I'd go for one database per customer. Then you have isolation, easy structure updates and above all, consistent front-end code, in whatever language that occurs. Just obtain the customer ID and then use the appropriate database. Everything else can remain the same. The only fly in the ointment concerns whether you'd ever have the customer need to cross databases. I would imagine that sort of thing is for internal use, not the customers. In that case, the performance hit if any won't impact upon the customer, just you. hth, Arthur On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent dae...@daevid.com wrote: I'm writing a report tool wherein we have many customers who subscribe to this SaaS. There are millions of rows of data per customer. All customers are islands from each other (of course). Are there any major issues or benefits between storing each customer in their own database (with their own tables), or all lumped into a single database? At first thought, it seems that by separating them, queries should be faster no (as there is less data to sift though per customer)? It of course makes upgrading table schema a wee bit more cumbersome, but a simple loop and script can handle that easily enough. And since you can query across databases, we can still make internal aggregate reports for our own usage. For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or we can use UNIONS etc. too. Consolidating them into one would seem to bloat the tables and slow things down (or is the fact that mySQL uses B-Trees invalidate that theory)? It also makes us have to have a customer_id entry in every table basically (or some FK to distinguish who's data is who's). It also feels like it could leak data if a malformed query were to get through, although I'm not terribly worried about this as we do some heavy UAT before pushing from DEV to TEST to PROD. Performance is a major factor concern here given our huge data sets involved. Does joining across databases impose any speed/performance hits vs. just joining across tables within a single database? http://daevid.com -- - Johnny Withers 601.209.4985 joh...@pixelated.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:
Re: Separate customer databases vs all in one
They'd all be identical DBs/schemas. And if we updated schema on one, we'd update all. We're talking like 100-200 operators total (they are airlines). They just have TONS of data each. So one of my questions that I was hoping to hear answered (and maybe I missed it) is this: Are mySQL reports going to be faster in tables that are exclusive to a single operator? In other words, if we have 100 operators data in a single table (let's say millions of rows PER operator), are those queries any slower (assuming proper indexes and an operator ID and all that business). Or is the nature of B-Trees such that it shouldn't matter either way? Logic would dictate that the RDBMS has to sift through all the records that are NOT for a given operator (of which 99% of the data would not be, correct?), so that seems like the hard drive has to jump over gobs and blobs of INT/VARCHAR/TEXT/etc. that it doesn't need. OR, again, is the filesystem/RDMBS/innodb file structured in such a way that the index knows which inode to pop to instantly for each row, so irrelevant rows are negligible hits to seek/read times? And related, are there any limitations to the number of databases we can use in a join for internal reports (or perhaps the length of a single SQL statement). Obviously we would construct a query in PHP or Python or something to assemble all the operator 1 ... 200 database names, so it's not like we'd hand-craft that. But it could get a fairly lengthy string/SQL/query to pass off to mySQL. For a simple example, perhaps we want to know how many 'foo' there are in a given table per Operator, and then a total of all 'foo'. (and yes this contrived example could be done in a loop and tally, but you get the concept). -Original Message- From: Arthur Fuller fuller.art...@gmail.com To: Daevid Vincent dae...@daevid.com Cc: mysql mysql@lists.mysql.com Subject: Re: Separate customer databases vs all in one Date: Tue, 17 Mar 2009 21:17:00 -0400 Are these databases identical or merely similar? If they are structurally identical, I'd go for one database per customer. Then you have isolation, easy structure updates and above all, consistent front-end code, in whatever language that occurs. Just obtain the customer ID and then use the appropriate database. Everything else can remain the same. The only fly in the ointment concerns whether you'd ever have the customer need to cross databases. I would imagine that sort of thing is for internal use, not the customers. In that case, the performance hit if any won't impact upon the customer, just you. hth, Arthur On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent dae...@daevid.com wrote: I'm writing a report tool wherein we have many customers who subscribe to this SaaS. There are millions of rows of data per customer. All customers are islands from each other (of course). Are there any major issues or benefits between storing each customer in their own database (with their own tables), or all lumped into a single database? At first thought, it seems that by separating them, queries should be faster no (as there is less data to sift though per customer)? It of course makes upgrading table schema a wee bit more cumbersome, but a simple loop and script can handle that easily enough. And since you can query across databases, we can still make internal aggregate reports for our own usage. For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or we can use UNIONS etc. too. Consolidating them into one would seem to bloat the tables and slow things down (or is the fact that mySQL uses B-Trees invalidate that theory)? It also makes us have to have a customer_id entry in every table basically (or some FK to distinguish who's data is who's). It also feels like it could leak data if a malformed query were to get through, although I'm not terribly worried about this as we do some heavy UAT before pushing from DEV to TEST to PROD. Performance is a major factor concern here given our huge data sets involved. Does joining across databases impose any speed/performance hits vs. just joining across tables within a single database? http://daevid.com
Separate customer databases vs all in one
I'm writing a report tool wherein we have many customers who subscribe to this SaaS. There are millions of rows of data per customer. All customers are islands from each other (of course). Are there any major issues or benefits between storing each customer in their own database (with their own tables), or all lumped into a single database? At first thought, it seems that by separating them, queries should be faster no (as there is less data to sift though per customer)? It of course makes upgrading table schema a wee bit more cumbersome, but a simple loop and script can handle that easily enough. And since you can query across databases, we can still make internal aggregate reports for our own usage. For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or we can use UNIONS etc. too. Consolidating them into one would seem to bloat the tables and slow things down (or is the fact that mySQL uses B-Trees invalidate that theory)? It also makes us have to have a customer_id entry in every table basically (or some FK to distinguish who's data is who's). It also feels like it could leak data if a malformed query were to get through, although I'm not terribly worried about this as we do some heavy UAT before pushing from DEV to TEST to PROD. Performance is a major factor concern here given our huge data sets involved. Does joining across databases impose any speed/performance hits vs. just joining across tables within a single database? http://daevid.com
Re: Separate customer databases vs all in one
Are these databases identical or merely similar? If they are structurally identical, I'd go for one database per customer. Then you have isolation, easy structure updates and above all, consistent front-end code, in whatever language that occurs. Just obtain the customer ID and then use the appropriate database. Everything else can remain the same. The only fly in the ointment concerns whether you'd ever have the customer need to cross databases. I would imagine that sort of thing is for internal use, not the customers. In that case, the performance hit if any won't impact upon the customer, just you. hth, Arthur On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent dae...@daevid.com wrote: I'm writing a report tool wherein we have many customers who subscribe to this SaaS. There are millions of rows of data per customer. All customers are islands from each other (of course). Are there any major issues or benefits between storing each customer in their own database (with their own tables), or all lumped into a single database? At first thought, it seems that by separating them, queries should be faster no (as there is less data to sift though per customer)? It of course makes upgrading table schema a wee bit more cumbersome, but a simple loop and script can handle that easily enough. And since you can query across databases, we can still make internal aggregate reports for our own usage. For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or we can use UNIONS etc. too. Consolidating them into one would seem to bloat the tables and slow things down (or is the fact that mySQL uses B-Trees invalidate that theory)? It also makes us have to have a customer_id entry in every table basically (or some FK to distinguish who's data is who's). It also feels like it could leak data if a malformed query were to get through, although I'm not terribly worried about this as we do some heavy UAT before pushing from DEV to TEST to PROD. Performance is a major factor concern here given our huge data sets involved. Does joining across databases impose any speed/performance hits vs. just joining across tables within a single database? http://daevid.com