Re: Separate customer databases vs all in one

2009-03-19 Thread Arthur Fuller
You only confirm my first-blush opinion: keep separate databases. Regarding
your second issue (joining across databases for reports), I would suggest
that these operations are best done against an OLAP database not the
numerous OLTP databases. That is, create an additional database whose sole
purpose is reporting, and which accumulates all the data from the numerous
OLTP databases, on some scheduled basis (schedule dictated by stakeholders
-- how recent should the reports be? Will one week do? One day? One hour?

The basic idea here is that reporting does aggregates and therefore
necessarily does table scans, especially in your case. To place this burden
on the OLTP databases is an error in design and more important, a bottleneck
in performance. My advice would be to separate the reporting tasks from the
data-entry tasks. Aggregate the data periodically in the OLAP database and
base all your reports on this, not on the OLTP databases. This way you
maximize data-entry and update speed, while also maximizing the reporting
speed (since it won't cause contention with the data-entry activities).

This would mean that the aggregate db is very large, but OTOH interrogating
it won't impair the OLTP databases in the slightest. So the big problem this
scenario suggests is the granularity of the updates to the OLAP version of
the data. That's not for me to decide. Ask the stakeholders how recent the
data must be and proceed from there.

Arthur


Re: Separate customer databases vs all in one

2009-03-18 Thread Johnny Withers
I have an application and database doing this right now. We run both
ways, a single db per customer and a single customer per db. Smaller
customers are in a shared enviroment and larger customers have their
own db, sometimes they even have their own server. We use a company_id
field in each table to identify the data.


On 3/17/09, Arthur Fuller fuller.art...@gmail.com wrote:
 Are these databases identical or merely similar? If they are structurally
 identical, I'd go for one database per customer. Then you have isolation,
 easy structure updates and  above all, consistent front-end code, in
 whatever language that occurs. Just obtain the customer ID and then use the
 appropriate database. Everything else can remain the same.

 The only fly in the ointment concerns whether you'd ever have the customer
 need to cross databases. I would imagine that sort of thing is for internal
 use, not the customers. In that case, the performance hit if any won't
 impact upon the customer, just you.

 hth,
 Arthur

 On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent dae...@daevid.com wrote:

 I'm writing a report tool wherein we have many customers who subscribe
 to this SaaS. There are millions of rows of data per customer. All
 customers are islands from each other (of course).

 Are there any major issues or benefits between storing each customer in
 their own database (with their own tables), or all lumped into a single
 database?

 At first thought, it seems that by separating them, queries should be
 faster no (as there is less data to sift though per customer)? It of
 course makes upgrading table schema a wee bit more cumbersome, but a
 simple loop and script can handle that easily enough. And since you can
 query across databases, we can still make internal aggregate reports for
 our own usage.

 For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or
 we can use UNIONS etc. too.

 Consolidating them into one would seem to bloat the tables and slow
 things down (or is the fact that mySQL uses B-Trees invalidate that
 theory)? It also makes us have to have a customer_id entry in every
 table basically (or some FK to distinguish who's data is who's). It also
 feels like it could leak data if a malformed query were to get through,
 although I'm not terribly worried about this as we do some heavy UAT
 before pushing from DEV to TEST to PROD.

 Performance is a major factor concern here given our huge data sets
 involved. Does joining across databases impose any speed/performance
 hits vs. just joining across tables within a single database?

 http://daevid.com




-- 
-
Johnny Withers
601.209.4985
joh...@pixelated.net

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Separate customer databases vs all in one

2009-03-18 Thread Morten


On Mar 18, 2009, at 11:47 AM, Johnny Withers wrote:


I have an application and database doing this right now. We run both
ways, a single db per customer and a single customer per db. Smaller
customers are in a shared enviroment and larger customers have their
own db, sometimes they even have their own server. We use a company_id
field in each table to identify the data.


How do you go about table changes and such - run a script that  
connects to each DB or something more fancy than that?




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Re: Separate customer databases vs all in one

2009-03-18 Thread Wm Mussatto
On Wed, March 18, 2009 06:29, Johnny Withers wrote:
 Everytime we make a schema change in development we save an SQL script
 into
 a directory named in MMDD format. The scripts are named in
 MMDD_NNN_DESCRIPTION.SQL format, where NNN is the sequence number.
 This
I realize its probably a bit late for this, but wouldn't MMDD... make
more sense?  That way they would sort in proper order.
 allows us to string the files together in the proper order to apply when
 an
 update goes out. We simply look at the last update date, combine all the
 scripts (either using cmd line utilties or an app we wrote to do it) and
 apply this combined script to all the databases. The application to all
 databases is currently handled by hand each time since we don't more than
 once a week and there are not a lot of databases (15 to 20 right now).

 I can see us needing an automated way to apply these scripts in the future
 and it'll probably involve a custom application that makes use of a
 settings
 file that has the connection string for each database. It'd simply loop
 these connections and apply the scripts over a date range.

 We don't have a need for being able to query multiple databases at a time.
 Since some of these databases reside on different servers, I don't even
 think we could do that -- unless we started using an engine that supports
 this.

 -jw
 On Wed, Mar 18, 2009 at 7:01 AM, Stephen Sunderlin
 vze80...@verizon.netwrote:


 How do do you synchronize alter schema across the databases and what
 method
 do you use for internal reporting aggregate across the databases?

 Mar 18, 2009 06:47:52 AM, joh...@pixelated.net wrote:

 I have an application and database doing this right now. We run both
 ways, a single db per customer and a single customer per db. Smaller
 customers are in a shared enviroment and larger customers have their
 own db, sometimes they even have their own server. We use a company_id
 field in each table to identify the data.


  On 3/17/09, Arthur Fuller fuller.art...@gmail.com wrote:
  Are these databases identical or merely similar? If they are
 structurally
  identical, I'd go for one database per customer. Then you have
 isolation,
  easy structure updates and above all, consistent front-end code, in
  whatever language that occurs. Just obtain the customer ID and then
 use
 the
  appropriate database. Everything else can remain the same.
 
  The only fly in the ointment concerns whether you'd ever have the
 customer
  need to cross databases. I would imagine that sort of thing is for
 internal
  use, not the customers. In that case, the performance hit if any won't
  impact upon the customer, just you.
 
  hth,
  Arthur
 
  On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent dae...@daevid.com
 wrote:
 
  I'm writing a report tool wherein we have many customers who
 subscribe
  to this SaaS. There are millions of rows of data per customer. All
  customers are islands from each other (of course).
 
  Are there any major issues or benefits between storing each customer
 in
  their own database (with their own tables), or all lumped into a
 single
  database?
 
  At first thought, it seems that by separating them, queries should be
  faster no (as there is less data to sift though per customer)? It of
  course makes upgrading table schema a wee bit more cumbersome, but a
  simple loop and script can handle that easily enough. And since you
 can
  query across databases, we can still make internal aggregate reports
 for
  our own usage.
 
  For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar;
 or
  we can use UNIONS etc. too.
 
  Consolidating them into one would seem to bloat the tables and slow
  things down (or is the fact that mySQL uses B-Trees invalidate that
  theory)? It also makes us have to have a customer_id entry in every
  table basically (or some FK to distinguish who's data is who's). It
 also
  feels like it could leak data if a malformed query were to get
 through,
  although I'm not terribly worried about this as we do some heavy UAT
  before pushing from DEV to TEST to PROD.
 
  Performance is a major factor concern here given our huge data sets
  involved. Does joining across databases impose any speed/performance
  hits vs. just joining across tables within a single database?
 
  http://daevid.com
 
 


 --
 -
 Johnny Withers
 601.209.4985
 joh...@pixelated.net

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/mysql?unsub=stephen.sunder...@verizon.net




 --
 -
 Johnny Withers
 601.209.4985
 joh...@pixelated.net




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Re: Separate customer databases vs all in one

2009-03-18 Thread Johnny Withers
I suppose it would... but we have not (and hopefully never will) have to
release more than 12 months worth of upates at a time. If we do, can I
contact you to get all these files renamed???
I can see it causing a problem at hte end of each year though.

Dang.

-jw
On Wed, Mar 18, 2009 at 11:22 AM, Wm Mussatto mussa...@csz.com wrote:

 On Wed, March 18, 2009 06:29, Johnny Withers wrote:
  Everytime we make a schema change in development we save an SQL script
  into
  a directory named in MMDD format. The scripts are named in
  MMDD_NNN_DESCRIPTION.SQL format, where NNN is the sequence number.
  This
 I realize its probably a bit late for this, but wouldn't MMDD... make
 more sense?  That way they would sort in proper order.
   allows us to string the files together in the proper order to apply
 when
  an
  update goes out. We simply look at the last update date, combine all the
  scripts (either using cmd line utilties or an app we wrote to do it) and
  apply this combined script to all the databases. The application to all
  databases is currently handled by hand each time since we don't more than
  once a week and there are not a lot of databases (15 to 20 right now).
 
  I can see us needing an automated way to apply these scripts in the
 future
  and it'll probably involve a custom application that makes use of a
  settings
  file that has the connection string for each database. It'd simply loop
  these connections and apply the scripts over a date range.
 
  We don't have a need for being able to query multiple databases at a
 time.
  Since some of these databases reside on different servers, I don't even
  think we could do that -- unless we started using an engine that supports
  this.
 
  -jw
  On Wed, Mar 18, 2009 at 7:01 AM, Stephen Sunderlin
  vze80...@verizon.netwrote:
 
 
  How do do you synchronize alter schema across the databases and what
  method
  do you use for internal reporting aggregate across the databases?
 
  Mar 18, 2009 06:47:52 AM, joh...@pixelated.net wrote:
 
  I have an application and database doing this right now. We run both
  ways, a single db per customer and a single customer per db. Smaller
  customers are in a shared enviroment and larger customers have their
  own db, sometimes they even have their own server. We use a company_id
  field in each table to identify the data.
 
 
   On 3/17/09, Arthur Fuller fuller.art...@gmail.com wrote:
   Are these databases identical or merely similar? If they are
  structurally
   identical, I'd go for one database per customer. Then you have
  isolation,
   easy structure updates and above all, consistent front-end code, in
   whatever language that occurs. Just obtain the customer ID and then
  use
  the
   appropriate database. Everything else can remain the same.
  
   The only fly in the ointment concerns whether you'd ever have the
  customer
   need to cross databases. I would imagine that sort of thing is for
  internal
   use, not the customers. In that case, the performance hit if any won't
   impact upon the customer, just you.
  
   hth,
   Arthur
  
   On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent dae...@daevid.com
  wrote:
  
   I'm writing a report tool wherein we have many customers who
  subscribe
   to this SaaS. There are millions of rows of data per customer. All
   customers are islands from each other (of course).
  
   Are there any major issues or benefits between storing each customer
  in
   their own database (with their own tables), or all lumped into a
  single
   database?
  
   At first thought, it seems that by separating them, queries should be
   faster no (as there is less data to sift though per customer)? It of
   course makes upgrading table schema a wee bit more cumbersome, but a
   simple loop and script can handle that easily enough. And since you
  can
   query across databases, we can still make internal aggregate reports
  for
   our own usage.
  
   For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar;
  or
   we can use UNIONS etc. too.
  
   Consolidating them into one would seem to bloat the tables and slow
   things down (or is the fact that mySQL uses B-Trees invalidate that
   theory)? It also makes us have to have a customer_id entry in every
   table basically (or some FK to distinguish who's data is who's). It
  also
   feels like it could leak data if a malformed query were to get
  through,
   although I'm not terribly worried about this as we do some heavy UAT
   before pushing from DEV to TEST to PROD.
  
   Performance is a major factor concern here given our huge data sets
   involved. Does joining across databases impose any speed/performance
   hits vs. just joining across tables within a single database?
  
   http://daevid.com
  
  
 
 
  --
  -
  Johnny Withers
  601.209.4985
  joh...@pixelated.net
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:
  

Re: Separate customer databases vs all in one

2009-03-18 Thread Daevid Vincent
They'd all be identical DBs/schemas. And if we updated schema on one,
we'd update all. We're talking like 100-200 operators total (they are
airlines). They just have TONS of data each.

So one of my questions that I was hoping to hear answered (and maybe I
missed it) is this:

Are mySQL reports going to be faster in tables that are exclusive to a
single operator? In other words, if we have 100 operators data in a
single table (let's say millions of rows PER operator), are those
queries any slower (assuming proper indexes and an operator ID and all
that business). Or is the nature of B-Trees such that it shouldn't
matter either way? Logic would dictate that the RDBMS has to sift
through all the records that are NOT for a given operator (of which 99%
of the data would not be, correct?), so that seems like the hard drive
has to jump over gobs and blobs of INT/VARCHAR/TEXT/etc. that it doesn't
need. OR, again, is the filesystem/RDMBS/innodb file structured in such
a way that the index knows which inode to pop to instantly for each row,
so irrelevant rows are negligible hits to seek/read times?

And related, are there any limitations to the number of databases we can
use in a join for internal reports (or perhaps the length of a single
SQL statement). Obviously we would construct a query in PHP or Python or
something to assemble all the operator 1 ... 200 database names, so it's
not like we'd hand-craft that. But it could get a fairly lengthy
string/SQL/query to pass off to mySQL. For a simple example, perhaps we
want to know how many 'foo' there are in a given table per Operator, and
then a total of all 'foo'. (and yes this contrived example could be done
in a loop and tally, but you get the concept).

-Original Message-
From: Arthur Fuller fuller.art...@gmail.com
To: Daevid Vincent dae...@daevid.com
Cc: mysql mysql@lists.mysql.com
Subject: Re: Separate customer databases vs all in one
Date: Tue, 17 Mar 2009 21:17:00 -0400


Are these databases identical or merely similar? If they are structurally
identical, I'd go for one database per customer. Then you have isolation,
easy structure updates and  above all, consistent front-end code, in
whatever language that occurs. Just obtain the customer ID and then use the
appropriate database. Everything else can remain the same.

The only fly in the ointment concerns whether you'd ever have the customer
need to cross databases. I would imagine that sort of thing is for internal
use, not the customers. In that case, the performance hit if any won't
impact upon the customer, just you.

hth,
Arthur

On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent dae...@daevid.com wrote:

 I'm writing a report tool wherein we have many customers who subscribe
 to this SaaS. There are millions of rows of data per customer. All
 customers are islands from each other (of course).

 Are there any major issues or benefits between storing each customer in
 their own database (with their own tables), or all lumped into a single
 database?

 At first thought, it seems that by separating them, queries should be
 faster no (as there is less data to sift though per customer)? It of
 course makes upgrading table schema a wee bit more cumbersome, but a
 simple loop and script can handle that easily enough. And since you can
 query across databases, we can still make internal aggregate reports for
 our own usage.

 For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or
 we can use UNIONS etc. too.

 Consolidating them into one would seem to bloat the tables and slow
 things down (or is the fact that mySQL uses B-Trees invalidate that
 theory)? It also makes us have to have a customer_id entry in every
 table basically (or some FK to distinguish who's data is who's). It also
 feels like it could leak data if a malformed query were to get through,
 although I'm not terribly worried about this as we do some heavy UAT
 before pushing from DEV to TEST to PROD.

 Performance is a major factor concern here given our huge data sets
 involved. Does joining across databases impose any speed/performance
 hits vs. just joining across tables within a single database?

 http://daevid.com





Separate customer databases vs all in one

2009-03-17 Thread Daevid Vincent
I'm writing a report tool wherein we have many customers who subscribe
to this SaaS. There are millions of rows of data per customer. All
customers are islands from each other (of course).

Are there any major issues or benefits between storing each customer in
their own database (with their own tables), or all lumped into a single
database?

At first thought, it seems that by separating them, queries should be
faster no (as there is less data to sift though per customer)? It of
course makes upgrading table schema a wee bit more cumbersome, but a
simple loop and script can handle that easily enough. And since you can
query across databases, we can still make internal aggregate reports for
our own usage.

For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or
we can use UNIONS etc. too.

Consolidating them into one would seem to bloat the tables and slow
things down (or is the fact that mySQL uses B-Trees invalidate that
theory)? It also makes us have to have a customer_id entry in every
table basically (or some FK to distinguish who's data is who's). It also
feels like it could leak data if a malformed query were to get through,
although I'm not terribly worried about this as we do some heavy UAT
before pushing from DEV to TEST to PROD.

Performance is a major factor concern here given our huge data sets
involved. Does joining across databases impose any speed/performance
hits vs. just joining across tables within a single database?

http://daevid.com


Re: Separate customer databases vs all in one

2009-03-17 Thread Arthur Fuller
Are these databases identical or merely similar? If they are structurally
identical, I'd go for one database per customer. Then you have isolation,
easy structure updates and  above all, consistent front-end code, in
whatever language that occurs. Just obtain the customer ID and then use the
appropriate database. Everything else can remain the same.

The only fly in the ointment concerns whether you'd ever have the customer
need to cross databases. I would imagine that sort of thing is for internal
use, not the customers. In that case, the performance hit if any won't
impact upon the customer, just you.

hth,
Arthur

On Tue, Mar 17, 2009 at 8:21 PM, Daevid Vincent dae...@daevid.com wrote:

 I'm writing a report tool wherein we have many customers who subscribe
 to this SaaS. There are millions of rows of data per customer. All
 customers are islands from each other (of course).

 Are there any major issues or benefits between storing each customer in
 their own database (with their own tables), or all lumped into a single
 database?

 At first thought, it seems that by separating them, queries should be
 faster no (as there is less data to sift though per customer)? It of
 course makes upgrading table schema a wee bit more cumbersome, but a
 simple loop and script can handle that easily enough. And since you can
 query across databases, we can still make internal aggregate reports for
 our own usage.

 For example: SELECT * FROM customerA.foo.bar JOIN customerB.foo.bar; or
 we can use UNIONS etc. too.

 Consolidating them into one would seem to bloat the tables and slow
 things down (or is the fact that mySQL uses B-Trees invalidate that
 theory)? It also makes us have to have a customer_id entry in every
 table basically (or some FK to distinguish who's data is who's). It also
 feels like it could leak data if a malformed query were to get through,
 although I'm not terribly worried about this as we do some heavy UAT
 before pushing from DEV to TEST to PROD.

 Performance is a major factor concern here given our huge data sets
 involved. Does joining across databases impose any speed/performance
 hits vs. just joining across tables within a single database?

 http://daevid.com