Re: [GENERAL] Database cluster?

2000-12-01 Thread Valter Mazzola




From: "Gordan Bobic" To: Subject: Re: [GENERAL] Database cluster? Date: 
Fri, 1 Dec 2000 10:13:55 -

  I've succesfully pacthed linux kernel 2.2.17 with DIPC and modified  
postgresql's src (src/backend/storage/ipc/ipc.c) to create distributed  
shm and sem.

Please forgive my ignorance (I haven't used Postgres for that long), but 
what are shm and sem?


shared memory and semaphores

  The strategy is then to start a postgresql that creates shm and sem on  
ONE machine, then start other postgres on other machines on the cluster  
that create NO shared structures ( there is a command line flag to do 
this).

So, one "master" and lots of "slaves", right?


no, every machine is totally similar to the others, the only different this 
is that only ONE machine creates the ( network Distributed by DIPC)shared 
memory and semaphores.


  Than you can connect to any of the postgres on your cluster, for 
example:  round robin.

Hmm... But is this really what we want to do? This is less than ideal for 
several reasons (if I understand what you're saying correctly). Replication 
is off-line for a start, and it only works well for a system that has few 
inserts and lots of selects, probably from a lot of different users. 
Probably a good things for applications like web search engines, but not 
necessarily for much else.

*** it isn't replication. It's that your cluster behaves like a 
single-computer. You modify the 'OS' (GFS + DIPC), not postgresql.



  Another issue are datafiles, GFS seems promising.  But postgresql uses 
fcnl, and GFS (globalfilesystem.org) doesn't  support it yet.  A 
distributed filesystem with locking etc. is required, Ideas ?

Hmm... I am not sure that a distributed file system is what we want here. I 
think it might be better to have separate postgres databases on separate 
local file systems, and handle putting the data together on a higher level. 
I think this would be better for both performance and scaleability. Having

***yes... but WHEN we can have these features ? No one have done it till 
now, i've requested and searched but almost no reply.

one big file system is likely to incur heavy network traffic penalties, and 
that is not necessary, as it can be avoided by just having the distribution 
done on a database level, rather than file system level.

But then again, the distributed file system can be seen as a "neater" 
solution, and it might work rather well, if they get the caching right with 
the correct "near-line" distribution of data across the network file system 
to make sure that the data is where it is most useful. In other words, make 
sure that the files (or even better, inodes) that are frequently accessed 
by a computer are on that computer).

Still there is the issue of replication and redundancy.

***GFS does it transparently.

I just think that
for a database application, this would be best done on the database level, 
rather than a file system level, unless the distributed file system in use 
was designed with all the database-useful features in mind.

  Another issue is that DIPC doesn't have a failover mechanism.

Again, for a database, it might be best to handle it at a higher level.

  This is a shared All approach, it's not the best, but probably it's the 
  fastest solution (bad) to implement, with little modifications (4-5)  
lines to postgresql sources.

Indeed. As such, it should probably be the first thing to do toward 
"clustering" a database. Still, it would be good to have a clear 
development path, even though on that path we cludge things slightly at 
various steps in order to have a useable system now, as opposed to a 
"perfect" system later.


*** yes, i want clustering now...and i'm alone.
I my opinion if GFS will do fcntl (and we can ask to GFS people, i think), 
the stuff in this email can be done rapidly.


A shared all approach is not necessarily that bad. It is (as far as I can 
tell), not better or worse than a "share nothing" approach. They both have 
pros and cons. Ideally, we should work toward coming up with an idea for a 
hybrid system that would pick the best of both worlds.

  This system can give a sort of single-system-image, useful to distribute 
  other software beyond postgresql.

Indeed. This is always a good thing for scalability for most applications, 
but databases have their specific requirements which may not be best 
catered for by standard means of distributed processing. Still, what you 
are suggesting would be a major improvement, from where I'm looking at it, 
but I am probably biased by looking at it from the point of view of my 
particular application.

  Also Mariposa (http://s2k-ftp.cs.berkeley.edu:8000/mariposa/) seems  
interesting, but it's not maintained and it's for an old postgresql 
version.

Hmm... Most interesting. There could be something recyclable in there. Must 
look at the specs and some source later...


*** i've compiled it , but with no results.
An idea is to get diff to corresponding pure 

Re: [GENERAL] Running several postmaster using same database in parallel

2000-11-23 Thread Valter Mazzola

many users have asked for this feature (ie load balancing,clustering, of 1 
postgresql database)
but no answer from mailing-list, and no planning for this important feature, 
i don't understand why.
Why not set-up a site to found this project with donations... on 
postgresql.org?

valter mazzola


_
Get more from the Web.  FREE MSN Explorer download : http://explorer.msn.com