[SQL] idea for a geographically distributed database: how best to implement?

2005-11-17 Thread Andy Ballingall
Hello,

I've got a database for a website which is a variant of the 'show stuff near
to me' sort of thing.

Rather than host this database on a single server, I have a scheme in mind
to break the database up geographically so that each one can run comfortably
on a small server, but I'm not sure about the best way of implementing it.

Here's the scheme:


Imagine that the country is split into an array of square cells.
Each cell contains a database that stores information about people who live
in the area covered by the cell.

There's one problem with this scheme. What happens if you live near the edge
of a cell?

My solution is that any inserted data which lies near to the edge of cell A
is *also* inserted in the database of the relevant neighbouring cell - let's
say cell B.

Thus, if someone lives in cell B, but close to the border with cell A,
they'll see the data that is geographically close to 
them, even if it lies in cell A.



Is this a common pattern?

I could, of course, simply find every insert, update and delete in the
application and alter the code to explicitly update all the relevant
databases, but is there a more elegant way of simply saying: "Do this
transaction on both Database A and Database B" monotonically?

I've had a look at some replication solutions, but they all seem to involve
replicating an entire database. The advantage of my scheme is that if I can
distribute my application over large numbers of small servers, I'll end up
with more bangs for the buck, and it'll be much easier to manage growth by
managing the number of servers, and number of cells hosted on each server.

Thanks for any suggestions!
Andy Ballingall


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] idea for a geographically distributed database: how best to implement?

2005-11-17 Thread Andy Ballingall
Hi again,

I don't think I've explained my idea well enough:

>You might want to consider using latitude and longitude or zip codes or 
taking more of a traditional "GIS" approach rather than duplicating data 
across redundant databases.

I do use coordinates to position data. The databases aren't redundant. Each
database serves a particular region, the aim being to ensure that no one
database gets too big for the machines in my farm to cope with.

>Another issue is that you end up having to query every database to find 
proximity... Suppose you have 500 "cells" ? You now have to invoke some sort

of RPC mechanism on 499 other RDBMS, manage 500 user names and passwords, 
retrieve potentially 500 recordsets, merge the data, etc...


1. It doesn't matter how many cells there are. A user's details are only
found in the database covering the area he lives, and possibly in the
neighbouring databases (at most 3, if he lives in the corner of the cell).

2. The neighbouring cells of cell A never change, therefore you don't need
to do anything clever to identify which other databases to write to in these
edge conditions, since when the user logs in, the relationship information
is read just once (just like the user's name and picture)

3. No merging ever needs to take place during queries. Here's an example.
Let's say, you, a resident of cell A, say 'show me who lives nearby!'. The
database of cell A has *all* the information. Why? Well, when a resident in
nearby cell B registered, his details were also added to cell A at that
time. Database A doesn't know, or care, how the data it holds got there.

4. Which database do I use to serve all your requests? Well, yes, once, I
have to work out which one it is, based on where you are, but that's almost
no work at all (see next point...)


5. I didn't mention before that my scheme *does* also require a national
database, but the size of that is well defined, as the only jobs it needs to
do are:
a) Know the names of all the places,
b) Know the definition of the existing cells, and their positions - to make
it easy to manage the cell structure, and to know where to send you when you
login
c) A set of global sequences used to generate unique ids for all the other
databases. When your record goes into cell A and cell B, the id of this
record is the *same*...

>Your problems will continue to get more and more complex... You are better 
off with a single RDBMS and a single schema...

I currently have a single database. It is clearly simpler. However, I also
have a headache about what machinery to buy so that:
a) it starts off cheap
b) It is easy to expand, if and when the website takes off.

Yes, if I knew how many people would use this service, and how quickly it
would expand, I'd be able to raise the capital to buy a huge nationwide
server. 

But I don't. So the answer 'how big does your DB server need to be' is
simply 'I can only make a very poor guess'. This doesn't look good in a
business plan. Far better if I can demonstrate an application which can
start off on a single simple server, and can elegantly expand with the user
base, with the farm of servers growing over time.


Unlike many data sets, mine is almost totally partitioned geographically.
There is only *one* little detail - that of visibility of data in
neighbouring cells, and that is sorted out with my idea of duplicating
information between neighbours.

Hope that fills in some gaps...

Thanks for your comments,
Andy



"Andy Ballingall" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Hello,
>
> I've got a database for a website which is a variant of the 'show stuff 
> near
> to me' sort of thing.
>
> Rather than host this database on a single server, I have a scheme in mind
> to break the database up geographically so that each one can run 
> comfortably
> on a small server, but I'm not sure about the best way of implementing it.
>
> Here's the scheme:
>
> 
> Imagine that the country is split into an array of square cells.
> Each cell contains a database that stores information about people who 
> live
> in the area covered by the cell.
>
> There's one problem with this scheme. What happens if you live near the 
> edge
> of a cell?
>
> My solution is that any inserted data which lies near to the edge of cell 
> A
> is *also* inserted in the database of the relevant neighbouring cell - 
> let's
> say cell B.
>
> Thus, if someone lives in cell B, but close to the border with cell A,
> they'll see the data that is geographically close to
> them, even if it lies in cell A.
>
> 
>
> Is this a common pattern?
>
> I could, of course, sim

Re: [SQL] idea for a geographically distributed database: how best

2005-11-17 Thread Andy Ballingall


>Andy, i agree with what codeWarrior says.
But if you are interested in replication, dbmirror is very elegant
(altho not as trendy) simple, and highly customizable replication 
solution.

I have heavily modified dbmirror to even support
Asynchronous Conditional row grained 
Foreign key dependency Traversal oriented 
Lazy Replication!!!
(which ofcourse nobody wants :)


I'll go and check it out, so even if I don't want to use it, I'll at least
know what it means! 

(I considered suggesting using an acronym for it, but 'ACRGFKDTOLR' made my
lips bleed when I tried to say it).

Thanks,
Andy


> 
> 
> 
> 
> "Andy Ballingall" <[EMAIL PROTECTED]> wrote in message 
> news:[EMAIL PROTECTED]
> > Hello,
> >
> > I've got a database for a website which is a variant of the 'show stuff 
> > near
> > to me' sort of thing.
> >
> > Rather than host this database on a single server, I have a scheme in
mind
> > to break the database up geographically so that each one can run 
> > comfortably
> > on a small server, but I'm not sure about the best way of implementing
it.
> >
> > Here's the scheme:
> >
> > 
> > Imagine that the country is split into an array of square cells.
> > Each cell contains a database that stores information about people who 
> > live
> > in the area covered by the cell.
> >
> > There's one problem with this scheme. What happens if you live near the 
> > edge
> > of a cell?
> >
> > My solution is that any inserted data which lies near to the edge of
cell 
> > A
> > is *also* inserted in the database of the relevant neighbouring cell - 
> > let's
> > say cell B.
> >
> > Thus, if someone lives in cell B, but close to the border with cell A,
> > they'll see the data that is geographically close to
> > them, even if it lies in cell A.
> >
> > 
> >
> > Is this a common pattern?
> >
> > I could, of course, simply find every insert, update and delete in the
> > application and alter the code to explicitly update all the relevant
> > databases, but is there a more elegant way of simply saying: "Do this
> > transaction on both Database A and Database B" monotonically?
> >
> > I've had a look at some replication solutions, but they all seem to 
> > involve
> > replicating an entire database. The advantage of my scheme is that if I 
> > can
> > distribute my application over large numbers of small servers, I'll end
up
> > with more bangs for the buck, and it'll be much easier to manage growth
by
> > managing the number of servers, and number of cells hosted on each
server.
> >
> > Thanks for any suggestions!
> > Andy Ballingall
> >
> >
> > ---(end of broadcast)---
> > TIP 3: Have you checked our extensive FAQ?
> >
> >   http://www.postgresql.org/docs/faq
> > 
> 
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 

-- 
-Achilleus


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.3/173 - Release Date: 16/11/2005



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] idea for a geographically distributed database: how best to implement?

2005-11-18 Thread Andy Ballingall

Thanks Aidan,

That's very interesting! I've not used slony yet, so I'll setup some tests
and try exactly this mechanism.

What I like about it is that the changes that have to occur to either add a
cell or to split an existing cell into two to manage growing traffic have a
localised effect on the web of database interdependencies, making it more
straightforward to do maintenance without the website being taken offline
for most people. They'd only notice downtime if they happened to live in or
next to the cells being messed around with. Not having to touch the actual
web application is nice.

The other advantage over my manual hack is that the replicated data is just
that - replicated, so when each database does a backup, it's only the master
data that is getting backed up and restored. In my manual scheme, following
restoration after a node failure,  I'd have to do a postprocessing step to
work out which restored data was actually data slaved from a neighbouring
cell, and refresh it in case the neighbouring data has been updated since
the node went down.

(I hadn't considered backup and restore properly up till now, but I'm going
to add that to the list of good reasons for taking advantage of my
'distantly disjoint' data set. If my national data is shared between 500
cells, then each database is going to be titchy, and backup will take 'no
time at all', as in parallel, 500 streams of database data flow first to
node local storage and thence to safer, remote storage)

Cheers,
Andy

-

-Original Message-
From: Aidan Van Dyk [mailto:[EMAIL PROTECTED] 
Sent: 17 November 2005 19:54
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: idea for a geographically distributed database: how best to
implement?

Andy,

This is  *very* interesting scenario.  Definitely peaked my interest on this
one.

I haven't tried this scenario before, but I have used "inheritted" tables to
do "splitups".  Basically, just partition your data, and replicate certain
partitions between various neighbours.
  http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

With Slony, you can get "master slave" replication of particular tables
"sets" (not necessarily whole databases).

So I haven't developped this idea, or tried it, but you might be able to
play with it and make it work.

On *each* database, you have a head table, with no records:
  CREATE TABLE complete ();

And a bunch of "cell" tables that inherit the "complete" table:
  CREATE TABLE cell_a () INHERITS "complete";
  CREATE TABLE cell_b () INHERITS "complete";

Basically, you *always* work on the "complete" table, with the following
rules:

CREATE RULE cell_replicate_insert ON INSERT TO complete DO INSTEAD
INSERT INTO cell_a (...);
CREAT RULE cell_replicate_update ON UPDATE TO complete DO INSTEAD
UPDATE cell_a ...;

Use SlonyI to replecate cella to NODE B (master->slave), and on NODE B, do
something similar (except work on b, and replicate b -> NODE A).

This way, all your web/interface code everywhere is using the "complete"
table for any insert/update/select.  When insert/updateing, it will always
go to the "local" table, which Slony "replicates" out to the neighbouring
nodes.  And when selecting on it (the "complete" table), you get data from
your local table, and any "replicated" neighbouring node's tables which are
in the local schema inheriting from "complete" and SlonyI is updating.

Good luck, and if you get anywhere with this, let me know!

Andy Ballingall wrote:

> Hello,
> 
> I've got a database for a website which is a variant of the 'show stuff
> near to me' sort of thing.
> 
> Rather than host this database on a single server, I have a scheme in mind
> to break the database up geographically so that each one can run
> comfortably on a small server, but I'm not sure about the best way of
> implementing it.
> 
> Here's the scheme:
> 
> 
> Imagine that the country is split into an array of square cells.
> Each cell contains a database that stores information about people who
> live in the area covered by the cell.
> 
> There's one problem with this scheme. What happens if you live near the
> edge of a cell?
> 
> My solution is that any inserted data which lies near to the edge of cell
> A is *also* inserted in the database of the relevant neighbouring cell -
> let's say cell B.
> 
> Thus, if someone lives in cell B, but close to the border with cell A,
> they'll see the data that is geographically close to
> them, even if it lies in cell A.
> 
> 
> 
> Is this a common pattern?
> 
> I cou

[SQL] idea for a geographically distributed database: how best to implement?

2005-11-18 Thread Andy Ballingall








Hello,

 

I’ve got a database
for a website which is a variant of the ‘show stuff near to me’
sort of thing.

 

Rather than host this
database on a single server, I have a scheme in mind to break the database up
geographically so that each one can run comfortably on a small server, but
I’m not sure about the best way of implementing it.

 

Here’s the scheme:

 



Imagine that the country is
split into an array of square cells.

Each cell contains a
database that stores information about people who live in the area covered by
the cell.

 

There’s one problem
with this scheme. What happens if you live near the edge of a cell?

 

My solution is that any
inserted data which lies near to the edge of cell A is *also* inserted in the
database of the relevant neighbouring cell – let’s say cell B.

 

Thus, if someone lives in
cell B, but close to the border with cell A, they’ll see the data that is
geographically close to them, even if it lies in cell A.

 



 

Is this a common pattern?

 

I could, of course, simply
find every insert, update and delete in the application and alter the code to
explicitly update all the relevant databases, but is there a more elegant way
of simply saying: “Do this transaction on both Database A and Database
B” monotonically?

 

I’ve had a look at
some replication solutions, but they all seem to involve replicating an entire
database. The advantage of my scheme is that if I can distribute my application
over large numbers of small servers, I’ll end up with more bangs for the
buck, and it’ll be much easier to manage growth by managing the number of
servers, and number of cells hosted on each server.

 

Thanks for any suggestions!

Andy Ballingall

 

'pgsql-sql@postgresql.org'








[SQL] Is it possible to redirect an update/insert/delete to a different table?

2005-11-19 Thread Andy Ballingall








Hello,

 

Is there a way that I can specify that a request to change
to a particular table is redirected instead to a different table?

 

I’ve looked through rules, and as far as I can make
out, they are only useful for explicit actions. I’m looking for something
that behaves as though it simply substitutes the table name for a different
table name before executing the command, no matter what the command looks like.

 

Thanks

Andy Ballingall








Re: [SQL] Is it possible to redirect an update/insert/delete to a different table?

2005-11-20 Thread Andy Ballingall
Hello Peter,

I'm glad it's possible, but I can't see how from the documentation.

Say if I have a table called 'apples' and a table called 'pears'.

What would the rule look like that would remap all updates on apples so that
they were applied to pears instead?

Thanks,
Andy

-----

Andy Ballingall wrote:
> I've looked through rules, and as far as I can make out, they are
> only useful for explicit actions. I'm looking for something that
> behaves as though it simply substitutes the table name for a
> different table name before executing the command, no matter what the
> command looks like.

You can write a couple of rules for that.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/



-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.4/176 - Release Date: 20/11/2005



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Is it possible to redirect an update/insert/delete to a different table?

2005-11-20 Thread Andy Ballingall
Hello Jaime,

I'm still not quite clear.

Say I have a number of different updates on a table 'apples' in my code,
including:

UPDATE apples set pips=6 and color='yellow' where id=3;
UPDATE apples set size=10 where id=6;

What would a rule look like which, when *any* update is attempted on the
apples table, will instead apply the update to a different table - 'pears'.

I get this far:

CREATE rule pears_instead_of_apples 
AS ON UPDATE TO apples
DO INSTEAD UPDATE INTO pears .;

What do I put where the . is, so that the rule will transparently update
the pears table with whatever values happened to be defined by the original
update command? Is there a special keyword that I've missed?

Regards,
Andy Ballingall


-Original Message-
From: Jaime Casanova [mailto:[EMAIL PROTECTED] 
Sent: 20 November 2005 14:23
To: [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org
Subject: Re: Is it possible to redirect an update/insert/delete to a
different table?

> I've looked through rules, and as far as I can make out, they are only
> useful for explicit actions. I'm looking for something that behaves as
> though it simply substitutes the table name for a different table name
> before executing the command, no matter what the command looks like.
>

Make 3 rules (INSERT/UPDATE/DELETE), 4 if you want SELECT as well...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)



-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.4/176 - Release Date: 20/11/2005



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Is it possible to redirect an update/insert/delete to a different table?

2005-11-20 Thread Andy Ballingall
Hi Andreas,

The rule you've given only works for an update which changes the name.

If I do another update which changed the colour instead of the name, that
rule wouldn't do the right thing.

Instead, I'm looking for something which, with a single 'rule' (or whatever
the mechanism ends up being), intercepts *any* update on apples, and applies
the changes to the pears table instead, as if the only change that occurred
was a change of table name.

I can achieve this in the application which generates the sql commands, but
someone else suggested it was possible with rules, but it may not be the
case.

Thanks!
Andy




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Andreas Kretschmer
Sent: 20 November 2005 16:17
To: pgsql-sql@postgresql.org
Subject: Re: [SQL] Is it possible to redirect an update/insert/delete to a
different table?

Andy Ballingall <[EMAIL PROTECTED]> schrieb:

> Hello Peter,
> 
> I'm glad it's possible, but I can't see how from the documentation.
> 
> Say if I have a table called 'apples' and a table called 'pears'.
> 
> What would the rule look like that would remap all updates on apples so
that
> they were applied to pears instead?

create rule apples_pears_update as on update to apples do instead update
pears set name= NEW.name where id=NEW.id ;

test=# select * from apples ;
 id | name
+--
  1 | a
(1 row)

test=# select * from pears ;
 id | name
+--
  1 | b
(1 row)

test=# update apples set name = 'c' where id = 1;
UPDATE 1
test=# select * from pears ;
 id | name
+--
  1 | c
(1 row)


http://www.postgresql.org/docs/8.1/interactive/rules-update.html

HTH, Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

---(end of broadcast)---
TIP 6: explain analyze is your friend



-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.4/176 - Release Date: 20/11/2005



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] Is it possible to redirect an update/insert/delete to a different table?

2005-11-20 Thread Andy Ballingall
>Try it. [snipped example]

Ah. Basically, you set up the rule to assign every column, and if the update
doesn't redefine some columns, then it still works. I didn't understand that
you could get the rule to work generically like this.

I'll presume that the rule will need amending if the table column definition
later changes. (E.g. if I add 'stalk_length' to my apples and pears
tables)...

Thanks very much for your help.

Andy



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] idea for a geographically distributed database: how best to implement?

2005-11-22 Thread Andy Ballingall

David Bath wrote:
> There are a couple of philosophical perspectives I've come across in
> previous
> work with cadastral data that may be useful...[snipped]

Thanks, David

In this particular application, structures such as postcode sectors,
administrative boundaries etc. are not really of much importance, as most
stuff is a simple coordinate based searches. Even with the problem
partitioned into disjoint regions, within each region, the search remains
trivial, as all the data that the user is allowed to access will be stored
with that region (this includes data replicated from neighbouring regions).

In this context, the interesting task isn't so much the actual database
searching, or the exact definition of the disjoint regions.

The interesting task is to define a system which can dynamically remap the
hosting of regions to specific servers, so that no one server gets too busy.
As demand grows, I simply plug in more 4 blades and press the 'reconfigure'
button (Sorry - I was dreaming for a moment...)

The only limiters are the number of servers available and the activity
within a single region (which must be servable by a single server), but
given the highly localised nature of the application, the regions can be
very small, and I don't expect to ever see a region with more than 1GB of
data - the aim being for all the data to be resident in RAM.

So far, I've already seen some issues. I've been looking at slony-1 to
handle the replication between adjacent regions, and not only is it
asynchronous (I was hoping otherwise...slony-2 seems a long way off), but
changing the db schema has ramifications too. (I.e. changing the schema
means redefining each replication). Still - no show stoppers yet.

Thanks for your insights,

Andy


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly