Re: [GENERAL] why postgresql over other RDBMS

2007-07-18 Thread Naz Gassiep
Surely such a use case could, and more to the point *should* be met 
using PITR?

Regards,
- Naz.

Alvaro Herrera wrote:

A.M. wrote:
  

On May 24, 2007, at 14:29 , Wiebe Cazemier wrote:



On Thursday 24 May 2007 17:30, Alexander Staubo wrote:

  

[2] Nobody else has this, I believe, except possibly Ingres and
NonStop SQL. This means you can do a begin transaction, then issue
create table, alter table, etc. ad nauseum, and in the mean time
concurrent transactions will just work. Beautiful for atomically
upgrading a production server. Oracle, of course, commits after each
DDL statements.

If this is such a rare feature, I'm very glad we chose postgresql.  
I use it all
the time, and wouldn't know what to do without it. We circumvented  
Ruby on
Rails' migrations, and just implemented them in SQL. Writing  
migrations is a
breeze this way, and you don't have to hassle with atomicity, or  
the pain when

you discover the migration doesn't work on the production server.
  
Indeed. Wouldn't it be a cool feature to persists transaction states  
across connections so that a new connection could get access to a sub- 
transaction state? That way, you could make your schema changes and  
test them with any number of test clients (which designate the state  
to connect with) and then you would commit when everything works.


Unfortunately, the postgresql architecture wouldn't lend itself well  
to this. Still, it seems like a basic extension of the notion of sub- 
transactions.



Hmm, doesn't this Just Work with two-phase commit?

  


Re: [GENERAL] why postgresql over other RDBMS

2007-07-16 Thread Bruce Momjian

Added to TODO:

* Allow multiple indexes to be created concurrently, ideally via a
  single heap scan, and have a restore of a pg_dump somehow use it

  http://archives.postgresql.org/pgsql-general/2007-05/msg01274.php


---

Ron Johnson wrote:
 On 06/01/07 11:22, Bruce Momjian wrote:
  PFC wrote:
  On Thu, 31 May 2007 22:20:09 +0200, Vivek Khera [EMAIL PROTECTED] wrote:
 
  On May 25, 2007, at 5:28 PM, Tom Lane wrote:
 
  That's true at the level of DDL operations, but AFAIK we could
  parallelize table-loading and index-creation steps pretty effectively
  --- and that's where all the time goes.
  I would be happy with parallel builds of the indexes of a given table.   
  That way you have just one scan of the whole table to build all its  
  indexes.
 Will the synchronized seq scan patch be able to do this by issuing all  
  the CREATE INDEX commands at the same time from several different database 
   
  connections ?
  
  No, but it could someday.
 
 Or would a CREATE MANY INDEXES (where in one statement you specify 
 all the indexes on a single table) command be easier to implement?
 
 This way also the process reads the table once, building separate 
 sortwork files on-the-fly.  Too bad child processes can't inherit 
 transaction state.
 
 -- 
 Ron Johnson, Jr.
 Jefferson LA  USA
 
 Give a man a fish, and he eats for a day.
 Hit him with a fish, and he goes away for good!
 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org/

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-07-16 Thread Matthew T. O'Connor

Bruce Momjian wrote:

Added to TODO:

* Allow multiple indexes to be created concurrently, ideally via a
  single heap scan, and have a restore of a pg_dump somehow use it

  http://archives.postgresql.org/pgsql-general/2007-05/msg01274.php



Would it not also make sense to use this ability for a 
non-index-specific REINDEX command?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] why postgresql over other RDBMS

2007-07-16 Thread Bruce Momjian
Matthew T. O'Connor wrote:
 Bruce Momjian wrote:
  Added to TODO:
  
  * Allow multiple indexes to be created concurrently, ideally via a
single heap scan, and have a restore of a pg_dump somehow use it
  
http://archives.postgresql.org/pgsql-general/2007-05/msg01274.php
 
 
 Would it not also make sense to use this ability for a 
 non-index-specific REINDEX command?

Not sure, but I suppose.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] why postgresql over other RDBMS

2007-07-16 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 Matthew T. O'Connor wrote:
 Bruce Momjian wrote:
  
  * Allow multiple indexes to be created concurrently, ideally via a
single heap scan, and have a restore of a pg_dump somehow use it

Actually, the sync scan patch ought to make this more or less happen
magically. If you start a bunch of concurrent index builds they will try to
scan the table together. 

There's no useful way for pg_dump to make use of this since it only has one
backend. And you still need to generate n copies of the data for sorting. And
performing n sorts in parallel won't be as cache efficient as doing them one
after the other. So there's still a use case for the TODO 

But the hole is not nearly as urgent as before. You can get most of the
benefit if you really need it by rolling your own. And the cool thing is some
people already have rolled their own and they'll just magically see an
improvement. They don't have to do anything they weren't doing already to turn
it on.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] why postgresql over other RDBMS

2007-07-16 Thread Bruce Momjian
Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  Matthew T. O'Connor wrote:
  Bruce Momjian wrote:
   
   * Allow multiple indexes to be created concurrently, ideally via a
 single heap scan, and have a restore of a pg_dump somehow use it
 
 Actually, the sync scan patch ought to make this more or less happen
 magically. If you start a bunch of concurrent index builds they will try to
 scan the table together. 
 
 There's no useful way for pg_dump to make use of this since it only has one
 backend. And you still need to generate n copies of the data for sorting. And
 performing n sorts in parallel won't be as cache efficient as doing them one
 after the other. So there's still a use case for the TODO 
 
 But the hole is not nearly as urgent as before. You can get most of the
 benefit if you really need it by rolling your own. And the cool thing is some
 people already have rolled their own and they'll just magically see an
 improvement. They don't have to do anything they weren't doing already to turn
 it on.

They could roll their own a lot easier if you had finished the psql
concurrent patch.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] why postgresql over other RDBMS

2007-07-16 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 They could roll their own a lot easier if you had finished the psql
 concurrent patch.

I did. But you decided you didn't want it.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] why postgresql over other RDBMS

2007-07-16 Thread Bruce Momjian
Gregory Stark wrote:
 
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  They could roll their own a lot easier if you had finished the psql
  concurrent patch.
 
 I did. But you decided you didn't want it.

As far as I know, we asked for a libpq API change and you ignored
multiple requests.  You want the URLs?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] why postgresql over other RDBMS

2007-07-16 Thread Bruce Momjian
Bruce Momjian wrote:
 Gregory Stark wrote:
  
  Bruce Momjian [EMAIL PROTECTED] writes:
  
   They could roll their own a lot easier if you had finished the psql
   concurrent patch.
  
  I did. But you decided you didn't want it.
 
 As far as I know, we asked for a libpq API change and you ignored
 multiple requests.  You want the URLs?

Never mind.  You ignored requests.  I am not going to address this
further.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-07-16 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 As far as I know, we asked for a libpq API change and you ignored
 multiple requests.  You want the URLs?

No. One suggestion was made regarding an internal technical issue (polling
versus using select/poll on the sockets). I've long ago made that change
though I had questions about the best way to do it which were never answered
so even the way I made that change might not be acceptable.

In any case this was just one suggestion made based on discussion which
happened to turn up on list without actually reading the rest of the code. If
you're interested in reviewing the patch I'm sure you would have dozens of
issues. I would be happy to rework it along whatever lines you want.

But I would prefer to see people focus on reviewing major features like HOT,
clustered indexes, GII (which I would suggest calling index organized tables
since that's effectively what they are).

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-06-04 Thread Alban Hertroys
Tom Lane wrote:
 Right.  Multiple seqscans that are anywhere near reading the same block
 of a table will tend to self-synchronize.  There is a patch under
 consideration for 8.3 that helps this along by making seqscans run
 circularly --- that is, not always from block 0 to block N, but from
 block M to N and then 0 to M-1, where the start point M can be chosen
 by looking to see where any other concurrent seqscan is presently
 reading.  Once you've got a reasonable start point, you don't have to do
 anything else.
 
   regards, tom lane

Interesting concept (as expected from you guys).

Would that imply that the sequential scan of one connection could place
data into the disk cache that another parallel seq scan would need soon?
Would that speed up parallel seq scans? Or am I being optimistic here.

Regards,
-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-06-03 Thread Scott Ribe
 So it works right now, except it doesn't have (yet) the infrastructure to
 keep the scans synchronized

Perhaps you only got one read of the table because the process is
essentially self-synchronizing. Whenever one process gets ahead, it
requires a disk read for the next page, which causes it to block for a
relatively long time, during which time the other two processes either
proceed reading rows from cache, or come to the end of the cache and block
waiting for the same page to be read from disk. Obviously not a guarantee,
as indexing a relatively more expensive type COULD cause one process to get
multiple pages behind, and memory usage by other processes COULD cause
intervening pages to be flushed from cache. But I have a suspicion that the
experiment was not just a happy fluke, that there will be a strong tendency
for multiple simultaneous index operations to stay sufficiently closely
synch'd that the table will only be read from disk once. (Especially when
such operations are done while the database is otherwise quiescent, as would
be the typical case during a restore.)

-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] why postgresql over other RDBMS

2007-06-03 Thread PFC


I believe you have made quite a good description of what happens.

	Index-building isn't very CPU-intensive for integers (geometrics and  
tsearch is another matter, of course), so building all indexes of a large  
table in one pass is a possibility that works now, provided you issue all  
create index commands in concurrent connections at roughly the same time.


I don't think pgrestore does this, though.


So it works right now, except it doesn't have (yet) the infrastructure  
to

keep the scans synchronized


Perhaps you only got one read of the table because the process is
essentially self-synchronizing. Whenever one process gets ahead, it
requires a disk read for the next page, which causes it to block for a
relatively long time, during which time the other two processes either
proceed reading rows from cache, or come to the end of the cache and  
block
waiting for the same page to be read from disk. Obviously not a  
guarantee,
as indexing a relatively more expensive type COULD cause one process to  
get

multiple pages behind, and memory usage by other processes COULD cause
intervening pages to be flushed from cache. But I have a suspicion that  
the
experiment was not just a happy fluke, that there will be a strong  
tendency

for multiple simultaneous index operations to stay sufficiently closely
synch'd that the table will only be read from disk once. (Especially when
such operations are done while the database is otherwise quiescent, as  
would

be the typical case during a restore.)





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] why postgresql over other RDBMS

2007-06-03 Thread Tom Lane
Scott Ribe [EMAIL PROTECTED] writes:
 So it works right now, except it doesn't have (yet) the infrastructure to
 keep the scans synchronized

 Perhaps you only got one read of the table because the process is
 essentially self-synchronizing.

Right.  Multiple seqscans that are anywhere near reading the same block
of a table will tend to self-synchronize.  There is a patch under
consideration for 8.3 that helps this along by making seqscans run
circularly --- that is, not always from block 0 to block N, but from
block M to N and then 0 to M-1, where the start point M can be chosen
by looking to see where any other concurrent seqscan is presently
reading.  Once you've got a reasonable start point, you don't have to do
anything else.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] why postgresql over other RDBMS

2007-06-02 Thread PFC
On Sat, 02 Jun 2007 00:14:28 +0200, Ron Johnson [EMAIL PROTECTED]  
wrote:



On 06/01/07 16:38, PFC wrote:


Will the synchronized seq scan patch be able to do this by  
issuing all  the CREATE INDEX commands at the same time from several  
different database  connections ?

 No, but it could someday.
 Actually I tested, it does it right now, albeit unconsciously (pg  
doesn't do anything to synchronize the scans, but if you launch the  
concurrent connections at the same time and issue all your create  
index at the same time, only 1 table scan is needed). Maybe if the  
tables were bigger, it would lose sync between the 3 concurrent scans  
and would end up going slower. That's why I spoke about the  
synchronized scan patch.


How much of this, though, is from the OS's disk cache?  Or are Seq Scans  
O_DIRECT and bypass the OS cache?



	Well, the file was larger than disk cache, and I checked in vmstat's  
number of actual bytes read from disks...

Three threads read the table once, One thread reads the table 3 times.

	So it works right now, except it doesn't have (yet) the infrastructure to  
keep the scans synchronized, and psql can't open several connections (yet).


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] why postgresql over other RDBMS

2007-06-02 Thread Jeremy Harris

On 06/01/07 11:22, Bruce Momjian wrote:

 PFC wrote:

 On Thu, 31 May 2007 22:20:09 +0200, Vivek Khera [EMAIL PROTECTED] wrote:


 On May 25, 2007, at 5:28 PM, Tom Lane wrote:


 That's true at the level of DDL operations, but AFAIK we could
 parallelize table-loading and index-creation steps pretty effectively
 --- and that's where all the time goes.
 I would be happy with parallel builds of the indexes of a given table.   
 That way you have just one scan of the whole table to build all its  
 indexes.
 	Will the synchronized seq scan patch be able to do this by issuing all  
 the CREATE INDEX commands at the same time from several different database  
 connections ?
 
 No, but it could someday.


Would it be possible to track stats sufficient for a cost/benefit based
automatic recreate of all indices on a table whenever a full-table-scan
occurred, whether due to a commanded index rebuild or not?

Cheers,
  Jeremy Harris

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] why postgresql over other RDBMS

2007-06-01 Thread Bruce Momjian
PFC wrote:
 On Thu, 31 May 2007 22:20:09 +0200, Vivek Khera [EMAIL PROTECTED] wrote:
 
 
  On May 25, 2007, at 5:28 PM, Tom Lane wrote:
 
  That's true at the level of DDL operations, but AFAIK we could
  parallelize table-loading and index-creation steps pretty effectively
  --- and that's where all the time goes.
 
  I would be happy with parallel builds of the indexes of a given table.   
  That way you have just one scan of the whole table to build all its  
  indexes.
 
   Will the synchronized seq scan patch be able to do this by issuing all  
 the CREATE INDEX commands at the same time from several different database  
 connections ?

No, but it could someday.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] why postgresql over other RDBMS

2007-06-01 Thread Ron Johnson

On 06/01/07 11:22, Bruce Momjian wrote:

PFC wrote:

On Thu, 31 May 2007 22:20:09 +0200, Vivek Khera [EMAIL PROTECTED] wrote:


On May 25, 2007, at 5:28 PM, Tom Lane wrote:


That's true at the level of DDL operations, but AFAIK we could
parallelize table-loading and index-creation steps pretty effectively
--- and that's where all the time goes.
I would be happy with parallel builds of the indexes of a given table.   
That way you have just one scan of the whole table to build all its  
indexes.
	Will the synchronized seq scan patch be able to do this by issuing all  
the CREATE INDEX commands at the same time from several different database  
connections ?


No, but it could someday.


Or would a CREATE MANY INDEXES (where in one statement you specify 
all the indexes on a single table) command be easier to implement?


This way also the process reads the table once, building separate 
sortwork files on-the-fly.  Too bad child processes can't inherit 
transaction state.


--
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-06-01 Thread PFC


	Will the synchronized seq scan patch be able to do this by issuing  
all  the CREATE INDEX commands at the same time from several different  
database  connections ?

 No, but it could someday.


	Actually I tested, it does it right now, albeit unconsciously (pg doesn't  
do anything to synchronize the scans, but if you launch the concurrent  
connections at the same time and issue all your create index at the same  
time, only 1 table scan is needed). Maybe if the tables were bigger, it  
would lose sync between the 3 concurrent scans and would end up going  
slower. That's why I spoke about the synchronized scan patch.


Or would a CREATE MANY INDEXES (where in one statement you specify all  
the indexes on a single table) command be easier to implement?


	You can get the effect right now by using concurrent connections it  
seems. Not very practical in a psql script, though...


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-06-01 Thread Ron Johnson

On 06/01/07 16:38, PFC wrote:


Will the synchronized seq scan patch be able to do this by 
issuing all  the CREATE INDEX commands at the same time from several 
different database  connections ?

 No, but it could someday.


Actually I tested, it does it right now, albeit unconsciously (pg 
doesn't do anything to synchronize the scans, but if you launch the 
concurrent connections at the same time and issue all your create 
index at the same time, only 1 table scan is needed). Maybe if the 
tables were bigger, it would lose sync between the 3 concurrent scans 
and would end up going slower. That's why I spoke about the 
synchronized scan patch.


How much of this, though, is from the OS's disk cache?  Or are Seq 
Scans O_DIRECT and bypass the OS cache?


Or would a CREATE MANY INDEXES (where in one statement you specify 
all the indexes on a single table) command be easier to implement?


You can get the effect right now by using concurrent connections it 
seems. Not very practical in a psql script, though...


--
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] why postgresql over other RDBMS

2007-05-31 Thread Vivek Khera


On May 25, 2007, at 5:28 PM, Tom Lane wrote:


That's true at the level of DDL operations, but AFAIK we could
parallelize table-loading and index-creation steps pretty effectively
--- and that's where all the time goes.


I would be happy with parallel builds of the indexes of a given  
table.  That way you have just one scan of the whole table to build  
all its indexes.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] why postgresql over other RDBMS

2007-05-31 Thread PFC

On Thu, 31 May 2007 22:20:09 +0200, Vivek Khera [EMAIL PROTECTED] wrote:



On May 25, 2007, at 5:28 PM, Tom Lane wrote:


That's true at the level of DDL operations, but AFAIK we could
parallelize table-loading and index-creation steps pretty effectively
--- and that's where all the time goes.


I would be happy with parallel builds of the indexes of a given table.   
That way you have just one scan of the whole table to build all its  
indexes.


	Will the synchronized seq scan patch be able to do this by issuing all  
the CREATE INDEX commands at the same time from several different database  
connections ?


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-31 Thread PFC

On Thu, 31 May 2007 23:36:32 +0200, PFC [EMAIL PROTECTED] wrote:


On Thu, 31 May 2007 22:20:09 +0200, Vivek Khera [EMAIL PROTECTED] wrote:



On May 25, 2007, at 5:28 PM, Tom Lane wrote:


That's true at the level of DDL operations, but AFAIK we could
parallelize table-loading and index-creation steps pretty effectively
--- and that's where all the time goes.


I would be happy with parallel builds of the indexes of a given table.   
That way you have just one scan of the whole table to build all its  
indexes.


Just did a test :

- large table (does not fit in RAM)
- rows with text column (forum posts)
- about 700K rows

Time to create 3 indexes : 61 s
Time to create 3 indexes with 3 simultaneous connections : 22 s

That's what you would expect...
	vmstat shows the data is really loaded from disk, once with the 3  
threads, 3 times when indexes are created one at a time.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] why postgresql over other RDBMS

2007-05-28 Thread Andrew Sullivan
On Sat, May 26, 2007 at 02:33:05PM -0400, Tom Lane wrote:
 making userids be [EMAIL PROTECTED] not just a username).  This is probably 
 why
 they still haven't got IPv6 support:
 http://bugs.mysql.com/bug.php?id=8836
 I wonder what their plans are for fixing that ...

Panic at the last second, like everyone else?  It's actually
remarkable how many systems still have poor or dodgy IPv6 support. 
And the management tools for IPv6 remain sort of awful, given the
immense address space that people are going to have to deal with
(there's 128 bits in an IPv6 address).  But I guess MySQL AB will
have to figure out something soon, since even ARIN is on the IPv6
wagon now:

http://www.arin.net/announcements/20070521.html

(In the American-continents Internet operations world, this has
caused quite a kerfuffle, because many people have mostly ignored
IPv6 for some time.  And there are a lot of grotty corners to IPv6
for which there are no analogies in IPv4.  Uh, scoped addresses,
anyone ;-)

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Information security isn't a technological problem.  It's an economics
problem.
--Bruce Schneier

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] why postgresql over other RDBMS

2007-05-27 Thread Lincoln Yeoh

At 03:25 AM 5/25/2007, A.M. wrote:


Indeed. Wouldn't it be a cool feature to persists transaction states
across connections so that a new connection could get access to a 
sub- transaction state? That way, you could make your schema changes and

test them with any number of test clients (which designate the state
to connect with) and then you would commit when everything works.

Unfortunately, the postgresql architecture wouldn't lend itself well
to this. Still, it seems like a basic extension of the notion of 
sub- transactions.


I've proposed this for postgresql before (7 years ago?), but for a 
different reason. Didn't seem to get much interest though.


The idea was people wouldn't have to reinvent/reimplement 
transactions for stuff like webapps. Of course you might have to have 
a separate database for shopping cart or other persistent 
transactions, so that outstanding transactions don't cause lots of 
nonrelated transient rows to be kept unreclaimable.


And also I was thinking it would be good to decouple the maximum 
number of concurrent transactions supported by postgresql from the 
maximum number of concurrent connections supported.


Issues: what should happen if multiple connections try to continue 
the _same_ transaction.


Also should keep in mind clustering support though - this might 
create interesting problems and features in a clustering scenario :).


Regards,
Link.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] why postgresql over other RDBMS

2007-05-26 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/25/07 23:02, Chris Browne wrote:
 [EMAIL PROTECTED] (Alvaro Herrera) writes:
 Erik Jones wrote:

 And, to finish up, is there any reason that pg_restore couldn't  
 already work with separate processes working in parallel?
 The problem is that the ordering of objects in the dump is the only
 thing that makes the dump consistent with regards to the dependencies of
 objects.  So pg_restore cannot make any assumptions of parallelisability
 of the restoration process of objects in the dump.

 pg_dump is the only one who has the dependency information.

 If that information were to be saved in the dump, then maybe pg_restore
 could work in parallel.  But it seems a fairly non-trivial thing to do.

 Mind you, while I am idling at this idea, it seems that just having
 multiple processes generating a dump is not such a hot idea by itself,
 because you then have no clue on how to order the restoration of the
 multiple files that are going to result.
 
 I think it's less bad than you think.
 
 The really timeconsuming bits of pg_restore are:
 
 1. the loading of table data
 2. creating indices on those tables
 [distant] 3. setting up R/I constraints
 
 If you look at the present structure of pg_dump output, those are all
 pretty visibly separate steps.
 
 pg_dump output [loosely] consists of:
 - Type definitions  such
 - Table definitions
 - loading table data  (e.g. - 1)
 - stored function definitions
 - indices (e.g. - parts of 2)
 - primary keys(e.g. - the rest of 2)
 - triggers + rules(including 3)
 
 Thus, a parallel load would start by doing some things in a serial
 fashion, namely creating types and tables.  This isn't a
 parallelizable step, but so what?  It shouldn't take very long.

Which would be sped up by having pg_dump create multiple output files.

Of course, as I see it, this is only of real benefit when you are
using tablespaces spread across multiple RAID devices on a SAN or
multiple SCSI cards.  But then, organizations with lots of data
usually have that kind of h/w.

 The parallel load can load as many tables concurrently as you choose;
 since there are no indices or R/I triggers, those are immaterial
 factors.
 
 Generating indices and primary keys could, again, be parallelized
 pretty heavily, and have (potentially) heavy benefit.
 
 Furthermore, an interesting thing to do might be to use the same
 approach that Slony-I does, at present, for subscriptions.  It
 temporarily deactivates triggers and indices while loading the data,
 then reactivates them, and requests a re-index.  That would permit
 loading the *entire* schema, save for data, and then load and index
 with fairly much maximum possible efficiency.
 
 That seems like a not-completely-frightening SMOP (simple matter of
 programming).  Not completely trivial, but not frighteningly
 non-trivial...

pg_dump would have to be smart enough to rationally split the data
into N number of output files, and that would get tricky
(impossible?) if most of your data is in one *huge* unpartitioned
table in a single tablespace.  Que sera.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGV8yrS9HxQb37XmcRAhEPAKDl4231rervBQO3pLHO+HwNx9dX+ACfb4Pu
qSWZNGmh/x/04QQT//nlEwI=
=zs2a
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] why postgresql over other RDBMS

2007-05-26 Thread Stefan Kaltenbrunner
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 Erik Jones wrote:
 And, to finish up, is there any reason that pg_restore couldn't  
 already work with separate processes working in parallel?
 
 The problem is that the ordering of objects in the dump is the only
 thing that makes the dump consistent with regards to the dependencies of
 objects.  So pg_restore cannot make any assumptions of parallelisability
 of the restoration process of objects in the dump.
 
 That's true at the level of DDL operations, but AFAIK we could
 parallelize table-loading and index-creation steps pretty effectively
 --- and that's where all the time goes.

yes loading the data and creating the index is the most time consuming
part of a large dump and reload cycle.

 
 A more interesting question is what sort of hardware you need for that
 actually to be a win, though.  Loading a few tables in parallel sounds
 like an ideal recipe for oversaturating your disk bandwidth...

you don't actually need that much of disk bandwidth both COPY and CREATE
INDEX are CPU bottlenecked on modern boxes and reasonable disk
subsystems - spreading their work over multiple cores/processes can give
big benefits.
For example I have managed to load ~2B rows (5 integer columns - no
indexes) at a rate of about 32 rows/s on a modern(but already 1,5
years old) 4 core 2.6Ghz Opteron box (with 12 disks and BBWC iirc).
Using 4 concurrent processes to load the data resulted in about 93
rows/s loaded (hitting the disk-io limit at that rate).
So having the ability to parallelize those operations at both the dump
and the restore level would be a huge win.
A manual experiment I did a while back with doing that by hand (ie.
splitting the dump manually and feeding it in parallel with a
concurrency of 2)  on a copy of a production database brought down the
restore time from 3h+ to a bit less than 2 hours.


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] why postgresql over other RDBMS

2007-05-26 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 A more interesting question is what sort of hardware you need for that
 actually to be a win, though.  Loading a few tables in parallel sounds
 like an ideal recipe for oversaturating your disk bandwidth...

 you don't actually need that much of disk bandwidth both COPY and CREATE
 INDEX are CPU bottlenecked on modern boxes and reasonable disk
 subsystems - spreading their work over multiple cores/processes can give
 big benefits.

Hmm ... I wonder if that's true for COPY BINARY ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] why postgresql over other RDBMS

2007-05-26 Thread Harpreet Dhaliwal

is the host base configuration methodology in postgres superior to other
RDBMS.
is this something novel that postgres has come up with?

~Harpreet

On 5/26/07, Tom Lane [EMAIL PROTECTED] wrote:


Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 A more interesting question is what sort of hardware you need for that
 actually to be a win, though.  Loading a few tables in parallel sounds
 like an ideal recipe for oversaturating your disk bandwidth...

 you don't actually need that much of disk bandwidth both COPY and CREATE
 INDEX are CPU bottlenecked on modern boxes and reasonable disk
 subsystems - spreading their work over multiple cores/processes can give
 big benefits.

Hmm ... I wonder if that's true for COPY BINARY ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



Re: [GENERAL] why postgresql over other RDBMS

2007-05-26 Thread Zoltan Boszormenyi

If you ask me, yes. When I had to choose between MySQL 3.x and
PostgreSQL 6.5 a long ago and I was able to exclude the DB superuser
with REVOKE CONNECT from MySQL, I said no, thanks.
I did it on purpose to prove that you can the external configuration
is better in this case.

And apart from fixing pg_hba.conf after you move the machine,
PostgreSQL is quite location agnostic network-wise.
You can modify the IP address[es] and FQDN of  the machine,
which is not easily doable if you use e.g. Informix where the hostname
is stored deep inside the DB and some subsystems break if it changes.

Harpreet Dhaliwal írta:
is the host base configuration methodology in postgres superior to 
other RDBMS.

is this something novel that postgres has come up with?

~Harpreet

On 5/26/07, * Tom Lane* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
wrote:


Stefan Kaltenbrunner  [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] writes:
 Tom Lane wrote:
 A more interesting question is what sort of hardware you need
for that
 actually to be a win, though.  Loading a few tables in parallel
sounds
 like an ideal recipe for oversaturating your disk bandwidth...

 you don't actually need that much of disk bandwidth both COPY
and CREATE
 INDEX are CPU bottlenecked on modern boxes and reasonable disk
 subsystems - spreading their work over multiple cores/processes
can give
 big benefits.

Hmm ... I wonder if that's true for COPY BINARY ...

regards, tom lane

---(end of
broadcast)---
TIP 2: Don't 'kill -9' the postmaster





--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] why postgresql over other RDBMS

2007-05-26 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

If you ask me, yes. When I had to choose between MySQL 3.x and
PostgreSQL 6.5 a long ago and I was able to exclude the DB superuser
with REVOKE CONNECT from MySQL, I said no, thanks.
I did it on purpose to prove that you can the external configuration
is better in this case.


I wanted to write you can reenable the superuser to fix problems later,
so the external configuration is better.

And sorry for the top-posting.


And apart from fixing pg_hba.conf after you move the machine,
PostgreSQL is quite location agnostic network-wise.
You can modify the IP address[es] and FQDN of  the machine,
which is not easily doable if you use e.g. Informix where the hostname
is stored deep inside the DB and some subsystems break if it changes.

Harpreet Dhaliwal írta:
is the host base configuration methodology in postgres superior to 
other RDBMS.

is this something novel that postgres has come up with?

~Harpreet

On 5/26/07, * Tom Lane* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


Stefan Kaltenbrunner  [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] writes:
 Tom Lane wrote:
 A more interesting question is what sort of hardware you need
for that
 actually to be a win, though.  Loading a few tables in parallel
sounds
 like an ideal recipe for oversaturating your disk bandwidth...

 you don't actually need that much of disk bandwidth both COPY
and CREATE
 INDEX are CPU bottlenecked on modern boxes and reasonable disk
 subsystems - spreading their work over multiple cores/processes
can give
 big benefits.

Hmm ... I wonder if that's true for COPY BINARY ...

regards, tom lane

---(end of
broadcast)---
TIP 2: Don't 'kill -9' the postmaster








--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-05-26 Thread Tom Lane
Zoltan Boszormenyi [EMAIL PROTECTED] writes:
 Harpreet Dhaliwal írta:
 is the host base configuration methodology in postgres superior to 
 other RDBMS.

 If you ask me, yes. When I had to choose between MySQL 3.x and
 PostgreSQL 6.5 a long ago and I was able to exclude the DB superuser
 with REVOKE CONNECT from MySQL, I said no, thanks.
 I did it on purpose to prove that you can the external configuration
 is better in this case.

 And apart from fixing pg_hba.conf after you move the machine,
 PostgreSQL is quite location agnostic network-wise.

MySQL has a related problem, which is that they have embedded IPv4
addressing rather deeply into their client authentication logic (by
making userids be [EMAIL PROTECTED] not just a username).  This is probably why
they still haven't got IPv6 support:
http://bugs.mysql.com/bug.php?id=8836
I wonder what their plans are for fixing that ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-05-26 Thread Stefan Kaltenbrunner
Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 A more interesting question is what sort of hardware you need for that
 actually to be a win, though.  Loading a few tables in parallel sounds
 like an ideal recipe for oversaturating your disk bandwidth...
 
 you don't actually need that much of disk bandwidth both COPY and CREATE
 INDEX are CPU bottlenecked on modern boxes and reasonable disk
 subsystems - spreading their work over multiple cores/processes can give
 big benefits.
 
 Hmm ... I wonder if that's true for COPY BINARY ...

not sure on that - I was simply trying to say that even a simple
parallel dump  restore capability could result in a serious improvement
for people running large databases(especially considering that one can
now buy 1U boxes with 8+ cores or 2U boxes that can hold 14disks+)  ;-)


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] why postgresql over other RDBMS

2007-05-26 Thread PFC




MySQL has a related problem, which is that they have embedded IPv4
addressing rather deeply into their client authentication logic (by
making userids be [EMAIL PROTECTED] not just a username).  This is probably why
they still haven't got IPv6 support:
http://bugs.mysql.com/bug.php?id=8836
I wonder what their plans are for fixing that ...

regards, tom lane


	Well, I have now almost finished my spare-time forum benchmark of MySQL  
versus Postgres...
	It was a really interesting experience, and by turning a few stones I  
discovered some really nasty stuff about MySQL.. well, about InnoDB  
really. Here's one that you probably didn't know about :


	- auto_increment isn't concurrent, this means all inserts into a table  
which has an auto_increment column are serialized

- yes this means INSERT INTO SELECT locks out all other inserts
	- the lock is held during ON INSERT TRIGGERS, which means all INSERT  
TRIGGERs on a given table can't execute concurrently


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Tom Allison

I think the general topic got lost in the weeds there...

But it would be helpful if you identified what you care about in a  
database.  That drives a lot of these decisions.


Example:  My brother uses MySQL and I use Postgresql.
His reasons for MySQL:
More books on it at the bookstore.
It was already a default build in Rails.
Came installed on his SuSE build.
Didn't like the pg_hba.conf file and preferred to do all the  
permissions via SQL tables.


My reasons for Postgresql:
Very focused on consistent data and not losing information when  
things go wrong.
The performance curve as you add users is much flatter than MySQL.   
Under MySQL you have better speed with ~3 users but at 50 users  
Postgresql is a clear winner on my hardware.  In general I found it  
to scale smoother without a sweet spot.

I prefer pg_hba.conf over tables.
Zero maintenance  great tools.
It's more SQL standard so what I do on Oracle I can generally do on  
Postgresql and visa versa.


None of this has anything to do with Oracle or SQL Server, but these  
are examples of what is important to each of us.


What's important to you?

On May 24, 2007, at 4:06 AM, Jasbinder Singh Bali wrote:


Hi
I was wondering, apart from extensive procedural language support  
and being free,
what are other major advantages of Postgresql over other major  
RDBMS like oracle and sql server.


Any pointers would be highly appreciated.

Thanks,
~Jas



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Erik Jones

On May 24, 2007, at 5:21 PM, Chris Browne wrote:


[EMAIL PROTECTED] (A.M.) writes:

On May 24, 2007, at 14:29 , Wiebe Cazemier wrote:


On Thursday 24 May 2007 17:30, Alexander Staubo wrote:


[2] Nobody else has this, I believe, except possibly Ingres and
NonStop SQL. This means you can do a begin transaction, then  
issue
create table, alter table, etc. ad nauseum, and in the mean  
time

concurrent transactions will just work. Beautiful for atomically
upgrading a production server. Oracle, of course, commits after  
each

DDL statements.


If this is such a rare feature, I'm very glad we chose postgresql.
I use it all
the time, and wouldn't know what to do without it. We circumvented
Ruby on
Rails' migrations, and just implemented them in SQL. Writing
migrations is a
breeze this way, and you don't have to hassle with atomicity, or
the pain when
you discover the migration doesn't work on the production server.


Indeed. Wouldn't it be a cool feature to persists transaction states
across connections so that a new connection could get access to a  
sub-

transaction state? That way, you could make your schema changes and
test them with any number of test clients (which designate the state
to connect with) and then you would commit when everything works.

Unfortunately, the postgresql architecture wouldn't lend itself well
to this. Still, it seems like a basic extension of the notion of sub-
transactions.


Jan Wieck had a proposal to a similar effect, namely to give some way
to get one connection to duplicate the state of another one.

This would permit doing a neat parallel decomposition of pg_dump: you
could do a 4-way parallelization of it that would function something
like the following:

- connection 1 opens, establishes the usual serialized mode  
transaction


- connection 1 dumps the table metadata into one or more files in a
  specified directory

- then it forks 3 more connections, and seeds them with the same
  serialized mode state

- it then goes thru and can dump 4 tables concurrently at a time,
  one apiece to a file in the directory.

This could considerably improve speed of dumps, possibly of restores,
too.

Note that this isn't related to subtransactions...


Interesting.  That's actually pretty close to the reindexing strategy/ 
script that I use and I've been planning on extending it to a vacuum  
strategy.  So, I will add my support into someone building this kind  
of support into pg_dump/restore.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Chris Browne
[EMAIL PROTECTED] (Erik Jones) writes:
 On May 24, 2007, at 5:21 PM, Chris Browne wrote:
 Jan Wieck had a proposal to a similar effect, namely to give some way
 to get one connection to duplicate the state of another one.

 This would permit doing a neat parallel decomposition of pg_dump: you
 could do a 4-way parallelization of it that would function something
 like the following [elided]:

 Interesting.  That's actually pretty close to the reindexing strategy/
 script that I use and I've been planning on extending it to a vacuum
 strategy.  So, I will add my support into someone building this kind
 of support into pg_dump/restore.

Well, I think that particular idea is dead for 8.3, as there wasn't
agreement that there were enough relevant use-cases.

If discussion gets bombarded with yes, yes, that's useful for me
too! responses the next time it gets proposed, then that will
increase the chances of acceptance.

We seem to be suffering, as the community, and patch queue, grows,
from the problem that features that are regarded as being useful only
to small sets of users are seeing greater reluctance for acceptance.
-- 
output = (cbbrowne @ acm.org)
http://linuxdatabases.info/info/linuxxian.html
Is your pencil Y2K certified?  Do you know the possible effects if it
isn't?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Andrew Sullivan
On Fri, May 25, 2007 at 12:36:20PM -0400, Chris Browne wrote:
 We seem to be suffering, as the community, and patch queue, grows,
 from the problem that features that are regarded as being useful only
 to small sets of users are seeing greater reluctance for acceptance.

Another way of expressing that regards it as a positive benefit:
given finite numbers of developers, testers, and patch reviewers, we
as a community have to make decisions about how big a feature set we
can realistically support, and the value that contributes to the user
community.  A small potential user community probably means a lower
estimation of the value of the feature.  So features that seem sort
of boutique are to be regarded at least with scepticism, in order to
keep the code useful for everyone.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
When my information changes, I alter my conclusions.  What do you do sir?
--attr. John Maynard Keynes

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/25/07 12:18, Andrew Sullivan wrote:
 On Fri, May 25, 2007 at 12:36:20PM -0400, Chris Browne wrote:
 We seem to be suffering, as the community, and patch queue, grows,
 from the problem that features that are regarded as being useful only
 to small sets of users are seeing greater reluctance for acceptance.
 
 Another way of expressing that regards it as a positive benefit:
 given finite numbers of developers, testers, and patch reviewers, we
 as a community have to make decisions about how big a feature set we
 can realistically support, and the value that contributes to the user
 community.  A small potential user community probably means a lower
 estimation of the value of the feature.  So features that seem sort
 of boutique are to be regarded at least with scepticism, in order to
 keep the code useful for everyone.

Except that seemingly boutique features can be road-blocks to
implementing projects, which means that you never hear from them.

In my case, there are two such road-blocks:
1. transaction failure on statement failure[0], and
2. single-threaded backups[1].


[0] Savepoints are a work-around, but there's a lot of existing code
that would have to be modified.  And you need a savepoint for every
INSERT and UPDATE.

[1] Tarballing data directories and saving WAL files works around
that, but a pg_dump file is, in itself, a transactionaly consistent
database.  Shipping a bunch of tarballs and WALs to the development
team is much more complicated than a single (or multiple, if that
ever comes to pass) dump file.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVyP9S9HxQb37XmcRAq0JAJ0btgoWOpaxdedppqwJIKZSaOrtmgCg4Yuu
Lt/72CVBnOPflVgqnK7FgT8=
=KEf8
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Andrew Sullivan
On Fri, May 25, 2007 at 12:59:25PM -0500, Ron Johnson wrote:
 Except that seemingly boutique features can be road-blocks to
 implementing projects, which means that you never hear from them.

Yes.  That's a risk that free software projects take, alas.  If you
can't force your users to tell you what they're doing, then you can't
be sure you have the right picture of what they're doing.

 1. transaction failure on statement failure[0], and

I personally regard that as a feature, not a bug, so I'd be opposed
to changing it.

 2. single-threaded backups[1].

This one has always seemed like a nasty limitation to me, but given
the desire (which you have, apparently) to get transaction-bounds
consistency and the design of postgres, I don't see an easy path to
fixing this.

Something that is important to note, however, is that a group of
users who have enough of a desire for a feature that they code it up,
offer to support it, and make sure it gets reviewed (this one is
important!) are going to have an easier time of it.  What this means
is that users need to dedicate resources to things that aren't
obviously in their own immediate interests, like patch review, so
that when later they say, This patch works, there is a stronger
probability the community will take seriously their claims that the
patch works correctly.  Free software never comes without cost, and
this is one of them.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Information security isn't a technological problem.  It's an economics
problem.
--Bruce Schneier

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/25/07 13:09, Andrew Sullivan wrote:
 On Fri, May 25, 2007 at 12:59:25PM -0500, Ron Johnson wrote:
 Except that seemingly boutique features can be road-blocks to
 implementing projects, which means that you never hear from them.
 
 Yes.  That's a risk that free software projects take, alas.  If you
 can't force your users to tell you what they're doing, then you can't
 be sure you have the right picture of what they're doing.
 
 1. transaction failure on statement failure[0], and
 
 I personally regard that as a feature, not a bug, so I'd be opposed
 to changing it.

Why?  Not failing the whole transaction lets me decide how to handle
that *one* statement error, without have to code for retrying the
whole transaction.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVyo6S9HxQb37XmcRAoSBAJ97Q99SwXNfj6s33uQpwKDBUOtHhwCeMXKk
QAZ5Ti/1btcT/RaR9dLmjTI=
=pAKC
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Andrew Sullivan
On Fri, May 25, 2007 at 01:26:02PM -0500, Ron Johnson wrote:
  1. transaction failure on statement failure[0], and
  
  I personally regard that as a feature, not a bug, so I'd be opposed
  to changing it.
 
 Why?  Not failing the whole transaction lets me decide how to handle
 that *one* statement error, without have to code for retrying the
 whole transaction.

Because it's entailed by the definition of atomicity.  Codd put it
this way: Such a block constitutes a transaction if, during its
execution, either all parts succeed or none succeeds.  If you want
to get around this, you can put a subtransaction around every
statement.

You say you don't want to do the latter, but there's no reason your
client couldn't do it for you, in much the same way we have
AUTOCOMMIT modes.  Indeed, PL/pgSQL actually does this sort of trick
in order to get exception handling.  

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The year's penultimate month is not in truth a good way of saying
November.
--H.W. Fowler

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Erik Jones

On May 25, 2007, at 11:36 AM, Chris Browne wrote:


[EMAIL PROTECTED] (Erik Jones) writes:

On May 24, 2007, at 5:21 PM, Chris Browne wrote:
Jan Wieck had a proposal to a similar effect, namely to give some  
way

to get one connection to duplicate the state of another one.

This would permit doing a neat parallel decomposition of pg_dump:  
you

could do a 4-way parallelization of it that would function something
like the following [elided]:


Interesting.  That's actually pretty close to the reindexing  
strategy/

script that I use and I've been planning on extending it to a vacuum
strategy.  So, I will add my support into someone building this kind
of support into pg_dump/restore.


Well, I think that particular idea is dead for 8.3, as there wasn't
agreement that there were enough relevant use-cases.

If discussion gets bombarded with yes, yes, that's useful for me
too! responses the next time it gets proposed, then that will
increase the chances of acceptance.

We seem to be suffering, as the community, and patch queue, grows,
from the problem that features that are regarded as being useful only
to small sets of users are seeing greater reluctance for acceptance.
--


Well, in the current context, I'm less interested in shared  
transaction state across processes and more in the ability to speed  
up dumps with processes working in parallel.  However, given that  
shared transaction state would be necessary for a transactionally  
consistent dump, I guess you could say that my interest there lies in  
it as a means to an end.


As far as use cases go, it's really only useful for large databases  
(in our case shortening our dump time to significantly less than 12  
hours) and from what I've heard, until recently, postgres hasn't seen  
much of a user base with seriously large databases, at least that  
share their stats with the community.  Now that postgres has garnered  
a reputation as being a rock solid database and I'm seeing more and  
more books and online tutorials pushing postgres (as opposed to the  
previous de facto of mysql) that will most definitely change.


And, to finish up, is there any reason that pg_restore couldn't  
already work with separate processes working in parallel?


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Alvaro Herrera
Erik Jones wrote:

 And, to finish up, is there any reason that pg_restore couldn't  
 already work with separate processes working in parallel?

The problem is that the ordering of objects in the dump is the only
thing that makes the dump consistent with regards to the dependencies of
objects.  So pg_restore cannot make any assumptions of parallelisability
of the restoration process of objects in the dump.

pg_dump is the only one who has the dependency information.

If that information were to be saved in the dump, then maybe pg_restore
could work in parallel.  But it seems a fairly non-trivial thing to do.

Mind you, while I am idling at this idea, it seems that just having
multiple processes generating a dump is not such a hot idea by itself,
because you then have no clue on how to order the restoration of the
multiple files that are going to result.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
Para tener más hay que desear menos

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Erik Jones wrote:
 And, to finish up, is there any reason that pg_restore couldn't  
 already work with separate processes working in parallel?

 The problem is that the ordering of objects in the dump is the only
 thing that makes the dump consistent with regards to the dependencies of
 objects.  So pg_restore cannot make any assumptions of parallelisability
 of the restoration process of objects in the dump.

That's true at the level of DDL operations, but AFAIK we could
parallelize table-loading and index-creation steps pretty effectively
--- and that's where all the time goes.

A more interesting question is what sort of hardware you need for that
actually to be a win, though.  Loading a few tables in parallel sounds
like an ideal recipe for oversaturating your disk bandwidth...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Andrew Sullivan
On Fri, May 25, 2007 at 05:28:43PM -0400, Tom Lane wrote:
 That's true at the level of DDL operations, but AFAIK we could
 parallelize table-loading and index-creation steps pretty effectively
 --- and that's where all the time goes.

I made a presentation at OSCON a few years ago about how we did it
that way when we imported .org.  We had limited time to work in, and
we had to do a lot of validation, so getting the data in quickly was
important.  So we split the data files up into segments and loaded
them in parallel (Chris Browne did most of the implementation of
this.)  It was pretty helpful for loading, anyway.

 A more interesting question is what sort of hardware you need for that
 actually to be a win, though.  Loading a few tables in parallel sounds
 like an ideal recipe for oversaturating your disk bandwidth...

Right, you need to be prepared for that.  But of course, if you're in
the situation where you have to get a given database up and running,
who cares about the disk bandwidth? -- you don't have the database
running yet.  The kind of system that is busy enough to have that
size of database and that urgency of recovery is also the kind that
is likely to have dedicated storage hardware for that database.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Unfortunately reformatting the Internet is a little more painful 
than reformatting your hard drive when it gets out of whack.
--Scott Morris

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-25 Thread Chris Browne
[EMAIL PROTECTED] (Alvaro Herrera) writes:
 Erik Jones wrote:

 And, to finish up, is there any reason that pg_restore couldn't  
 already work with separate processes working in parallel?

 The problem is that the ordering of objects in the dump is the only
 thing that makes the dump consistent with regards to the dependencies of
 objects.  So pg_restore cannot make any assumptions of parallelisability
 of the restoration process of objects in the dump.

 pg_dump is the only one who has the dependency information.

 If that information were to be saved in the dump, then maybe pg_restore
 could work in parallel.  But it seems a fairly non-trivial thing to do.

 Mind you, while I am idling at this idea, it seems that just having
 multiple processes generating a dump is not such a hot idea by itself,
 because you then have no clue on how to order the restoration of the
 multiple files that are going to result.

I think it's less bad than you think.

The really timeconsuming bits of pg_restore are:

1. the loading of table data
2. creating indices on those tables
[distant] 3. setting up R/I constraints

If you look at the present structure of pg_dump output, those are all
pretty visibly separate steps.

pg_dump output [loosely] consists of:
- Type definitions  such
- Table definitions
- loading table data  (e.g. - 1)
- stored function definitions
- indices (e.g. - parts of 2)
- primary keys(e.g. - the rest of 2)
- triggers + rules(including 3)

Thus, a parallel load would start by doing some things in a serial
fashion, namely creating types and tables.  This isn't a
parallelizable step, but so what?  It shouldn't take very long.

The parallel load can load as many tables concurrently as you choose;
since there are no indices or R/I triggers, those are immaterial
factors.

Generating indices and primary keys could, again, be parallelized
pretty heavily, and have (potentially) heavy benefit.

Furthermore, an interesting thing to do might be to use the same
approach that Slony-I does, at present, for subscriptions.  It
temporarily deactivates triggers and indices while loading the data,
then reactivates them, and requests a re-index.  That would permit
loading the *entire* schema, save for data, and then load and index
with fairly much maximum possible efficiency.

That seems like a not-completely-frightening SMOP (simple matter of
programming).  Not completely trivial, but not frighteningly
non-trivial...
-- 
(format nil [EMAIL PROTECTED] cbbrowne cbbrowne.com)
http://linuxdatabases.info/info/linuxdistributions.html
Developmental Psychology
Schoolyard behavior resembles adult primate behavior because Ontogeny
Recapitulates Phylogeny doesn't stop at birth.
-- Mark Miller

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Jasbinder Singh Bali

Hi
I was wondering, apart from extensive procedural language support and being
free,
what are other major advantages of Postgresql over other major RDBMS like
oracle and sql server.

Any pointers would be highly appreciated.

Thanks,
~Jas


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Tom Allison

You've addressed cost and performance.
Not much left.

Try it out for yourself and see if it works for you.

On May 24, 2007, at 4:06 AM, Jasbinder Singh Bali wrote:


Hi
I was wondering, apart from extensive procedural language support  
and being free,
what are other major advantages of Postgresql over other major  
RDBMS like oracle and sql server.


Any pointers would be highly appreciated.

Thanks,
~Jas



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Richard Huxton

Tom Allison wrote:

You've addressed cost and performance.
Not much left.

Try it out for yourself and see if it works for you.


+ elephant in logo
- unpronounceable name
+ excellent mailing lists
+ excellent developer community
- you can download as many copies as you like and a salesman still won't 
take you out to lunch


If you want specific vs each DB...

MSSQL obviously integrates nicely with the rest of the MS developer 
tools. If you're planning a .NET (TM) deployment over a range of Windows 
(TM) systems and have a lot of experiences MS developers in-house then 
it's perhaps the default choice.


Oracle has much more experience running on top-end hardware than PG. If 
you've got the in-depth knowledge and/or the money then you can push it 
further.


On the other hand, I can buy a 2-cpu x 4-core machine with 16GB RAM and 
half-a-dozen disks for £5k from dell (not that you necessarily would). 
That's a lot of oomph for the money - think what it would have cost five 
years ago.


Add Debian + PostgreSQL, total cost=£5k.

Add Windows + SQL Server, total cost = £12k

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Wiebe Cazemier
On Thursday 24 May 2007 10:06, Jasbinder Singh Bali wrote:

 Hi
 I was wondering, apart from extensive procedural language support and being
 free,
 what are other major advantages of Postgresql over other major RDBMS like
 oracle and sql server.
 
 Any pointers would be highly appreciated.
 
 Thanks,
 ~Jas

A free (as in free speech) license. You can still use postgresql if the
developers decide to stop making postgres, or stop making it free, as it is
likely it will be forked and continued by others (like XFree86 forked into
Xorg).

And, you won't run into annoying licence agreements that you can't use it on
machines with two CPU's... 

In my experience as sysadmin, I prefer free licenses (open source) software so
much above commercial software. 

Something to think about.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Ben Trewern
Better support!

Where else can you get feedback from the actual programmers (sometimes 
within minutes of writing a message) than here?

Ben

 Hi
 I was wondering, apart from extensive procedural language support
 and being free,
 what are other major advantages of Postgresql over other major
 RDBMS like oracle and sql server.

 Any pointers would be highly appreciated.

 Thanks,
 ~Jas 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Alexander Staubo

On 5/24/07, Richard Huxton [EMAIL PROTECTED] wrote:

Tom Allison wrote:
 You've addressed cost and performance.
 Not much left.

 Try it out for yourself and see if it works for you.

+ elephant in logo
- unpronounceable name
+ excellent mailing lists
+ excellent developer community
- you can download as many copies as you like and a salesman still won't
take you out to lunch


+ Friendly toolset in the box. [1]
+ Transactional DDL. [2]
+ Table inheritance, if you care to use unportable features.
+ Extensibility. [3]
+ PostGIS for spatial extensions.
- Replication offerings suck.
- Warm standby involve a lot of manual labour.
- Point-in-time recovery involve a lot of manual labour.

[1] psql+readline, pg_dump etc. are a breeze compared to crusty Oracle
tools; psql feels distinctly modern compared to MySQL's crummy
interpreter.

[2] Nobody else has this, I believe, except possibly Ingres and
NonStop SQL. This means you can do a begin transaction, then issue
create table, alter table, etc. ad nauseum, and in the mean time
concurrent transactions will just work. Beautiful for atomically
upgrading a production server. Oracle, of course, commits after each
DDL statements.

[3] PostgreSQL can be extended with new domain types, stored-procedure
languages (eg., PL/Python, PL/Perl), functions (eg., dblink,
fuzzystrmatch, cube), and indexes (GiST, GIN). Together this allows
projects such as TSearch2 and PostGIS to be implemented as separate
extensions to PostgreSQL.

Alexander.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/24/07 10:30, Alexander Staubo wrote:
 On 5/24/07, Richard Huxton [EMAIL PROTECTED] wrote:
 Tom Allison wrote:
  You've addressed cost and performance.
  Not much left.
 
  Try it out for yourself and see if it works for you.

 + elephant in logo
 - unpronounceable name
 + excellent mailing lists
 + excellent developer community
 - you can download as many copies as you like and a salesman still won't
 take you out to lunch
 
 + Friendly toolset in the box. [1]
 + Transactional DDL. [2]
 + Table inheritance, if you care to use unportable features.
 + Extensibility. [3]
 + PostGIS for spatial extensions.
 - Replication offerings suck.
 - Warm standby involve a lot of manual labour.
 - Point-in-time recovery involve a lot of manual labour.
 
 [1] psql+readline, pg_dump etc. are a breeze compared to crusty Oracle
 tools; psql feels distinctly modern compared to MySQL's crummy
 interpreter.
 
 [2] Nobody else has this, I believe, except possibly Ingres and
 NonStop SQL. This means you can do a begin transaction, then issue
 create table, alter table, etc. ad nauseum, and in the mean time
 concurrent transactions will just work. Beautiful for atomically
 upgrading a production server. Oracle, of course, commits after each
 DDL statements.

Rdb/VMS and CODASYL DBMS (both Oracle, formerly DEC, products) also
have transactional DDL.  Actually, I was quite stunned to discover
that Oracle doesn't do that.

Interbase/Firebird probably also has transactional DDL.

 [3] PostgreSQL can be extended with new domain types, stored-procedure
 languages (eg., PL/Python, PL/Perl), functions (eg., dblink,
 fuzzystrmatch, cube), and indexes (GiST, GIN). Together this allows
 projects such as TSearch2 and PostGIS to be implemented as separate
 extensions to PostgreSQL.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVcaiS9HxQb37XmcRAu69AJ0YtBw6zpuR6zAZzEIjVWfisKeM5ACeKz+x
bJGv1Wh7YrUeYglaOHg1n7I=
=YKt9
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Ericson Smith

For us?

Stability. Pure and simple. It ended up that the site was faster too.
One issue with postgresql is that connection time is slower than
mysql. Otherwise most everything else is just as fast or faster.

So with Ruby on Rails, there is a persistent connections from the
container, and that disadvantage dissapears. If there's anything slow
in the site, its ruby template rendering, database access generally is
10% or less of total time it takes to generate our most popular pages.

Ericson Smith
Developer
http://www.funadvice.com



-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/24/07 10:30, Alexander Staubo wrote:
 On 5/24/07, Richard Huxton [EMAIL PROTECTED] wrote:
 Tom Allison wrote:
  You've addressed cost and performance.
  Not much left.
 
  Try it out for yourself and see if it works for you.

 + elephant in logo
 - unpronounceable name
 + excellent mailing lists
 + excellent developer community
 - you can download as many copies as you like and a salesman still won't
 take you out to lunch

 + Friendly toolset in the box. [1]
 + Transactional DDL. [2]
 + Table inheritance, if you care to use unportable features.
 + Extensibility. [3]
 + PostGIS for spatial extensions.
 - Replication offerings suck.
 - Warm standby involve a lot of manual labour.
 - Point-in-time recovery involve a lot of manual labour.

 [1] psql+readline, pg_dump etc. are a breeze compared to crusty Oracle
 tools; psql feels distinctly modern compared to MySQL's crummy
 interpreter.

 [2] Nobody else has this, I believe, except possibly Ingres and
 NonStop SQL. This means you can do a begin transaction, then issue
 create table, alter table, etc. ad nauseum, and in the mean time
 concurrent transactions will just work. Beautiful for atomically
 upgrading a production server. Oracle, of course, commits after each
 DDL statements.

Rdb/VMS and CODASYL DBMS (both Oracle, formerly DEC, products) also
have transactional DDL.  Actually, I was quite stunned to discover
that Oracle doesn't do that.

Interbase/Firebird probably also has transactional DDL.

 [3] PostgreSQL can be extended with new domain types, stored-procedure
 languages (eg., PL/Python, PL/Perl), functions (eg., dblink,
 fuzzystrmatch, cube), and indexes (GiST, GIN). Together this allows
 projects such as TSearch2 and PostGIS to be implemented as separate
 extensions to PostgreSQL.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVcaiS9HxQb37XmcRAu69AJ0YtBw6zpuR6zAZzEIjVWfisKeM5ACeKz+x
bJGv1Wh7YrUeYglaOHg1n7I=
=YKt9
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Alexander Staubo

On 5/24/07, Ron Johnson [EMAIL PROTECTED] wrote:

 [2] Nobody else has this, I believe, except possibly Ingres and
 NonStop SQL. This means you can do a begin transaction, then issue
 create table, alter table, etc. ad nauseum, and in the mean time
 concurrent transactions will just work. Beautiful for atomically
 upgrading a production server. Oracle, of course, commits after each
 DDL statements.

Rdb/VMS and CODASYL DBMS (both Oracle, formerly DEC, products) also
have transactional DDL.


Not exactly mainstream databases, and Codasyl isn't even a relational
database as far as I remember.


Interbase/Firebird probably also has transactional DDL.


Apparently. I'm surprised, given how InterBase and Firebird otherwise
feel so awfully ancient.

All right, so not unique, but very rare.

Alexander.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Erik Jones


On May 24, 2007, at 4:39 AM, Richard Huxton wrote:


- unpronounceable name


post-gres-queue-el

Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/24/07 12:48, Alexander Staubo wrote:
 On 5/24/07, Ron Johnson [EMAIL PROTECTED] wrote:
  [2] Nobody else has this, I believe, except possibly Ingres and
  NonStop SQL. This means you can do a begin transaction, then issue
  create table, alter table, etc. ad nauseum, and in the mean time
  concurrent transactions will just work. Beautiful for atomically
  upgrading a production server. Oracle, of course, commits after each
  DDL statements.

 Rdb/VMS and CODASYL DBMS (both Oracle, formerly DEC, products) also
 have transactional DDL.
 
 Not exactly mainstream databases, and Codasyl isn't even a relational
 database as far as I remember.

True, they are legacy systems, but are still actively developed by
Oracle, for Alpha and Itanium VMS.  (The VAX products still are
actually maintained, but only for bug-fixes.)

And yes, DBMS uses the network model.  (Since both it and Rdb use
the same back end and on-disk structures, it's not that much extra
effort to also keep on actively developing DBMS.)

 Interbase/Firebird probably also has transactional DDL.
 
 Apparently. I'm surprised, given how InterBase and Firebird otherwise
 feel so awfully ancient.

InterBase was devloped by a man who worked for DEC at the same time
that Rdb was in the works.  His DECisms are still apparent in Firebird.

God, I'm going to hate it when we migrate to Oracle...

 All right, so not unique, but very rare.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVdNvS9HxQb37XmcRAkK4AKCxilH8XRM9es1RZ/Vma/AUrLoVnACfaGgP
VSce0lLj/mw2Vj435EDEaII=
=Gkbz
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Leif B. Kristensen
On Thursday 24. May 2007 19:57, Erik Jones wrote:
On May 24, 2007, at 4:39 AM, Richard Huxton wrote:
 - unpronounceable name

post-gres-queue-el

Somebody probably wants to put that pot-grass-kewl thing in his pipe and 
smoke it.
-- 
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE
My Jazz Jukebox: http://www.last.fm/user/leifbk/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Wiebe Cazemier
On Thursday 24 May 2007 17:30, Alexander Staubo wrote:

 [2] Nobody else has this, I believe, except possibly Ingres and
 NonStop SQL. This means you can do a begin transaction, then issue
 create table, alter table, etc. ad nauseum, and in the mean time
 concurrent transactions will just work. Beautiful for atomically
 upgrading a production server. Oracle, of course, commits after each
 DDL statements.

If this is such a rare feature, I'm very glad we chose postgresql. I use it all
the time, and wouldn't know what to do without it. We circumvented Ruby on
Rails' migrations, and just implemented them in SQL. Writing migrations is a
breeze this way, and you don't have to hassle with atomicity, or the pain when
you discover the migration doesn't work on the production server.




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread A.M.


On May 24, 2007, at 14:29 , Wiebe Cazemier wrote:


On Thursday 24 May 2007 17:30, Alexander Staubo wrote:


[2] Nobody else has this, I believe, except possibly Ingres and
NonStop SQL. This means you can do a begin transaction, then issue
create table, alter table, etc. ad nauseum, and in the mean time
concurrent transactions will just work. Beautiful for atomically
upgrading a production server. Oracle, of course, commits after each
DDL statements.


If this is such a rare feature, I'm very glad we chose postgresql.  
I use it all
the time, and wouldn't know what to do without it. We circumvented  
Ruby on
Rails' migrations, and just implemented them in SQL. Writing  
migrations is a
breeze this way, and you don't have to hassle with atomicity, or  
the pain when

you discover the migration doesn't work on the production server.


Indeed. Wouldn't it be a cool feature to persists transaction states  
across connections so that a new connection could get access to a sub- 
transaction state? That way, you could make your schema changes and  
test them with any number of test clients (which designate the state  
to connect with) and then you would commit when everything works.


Unfortunately, the postgresql architecture wouldn't lend itself well  
to this. Still, it seems like a basic extension of the notion of sub- 
transactions.


Cheers,
M

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Alvaro Herrera
A.M. wrote:
 
 On May 24, 2007, at 14:29 , Wiebe Cazemier wrote:
 
 On Thursday 24 May 2007 17:30, Alexander Staubo wrote:
 
 [2] Nobody else has this, I believe, except possibly Ingres and
 NonStop SQL. This means you can do a begin transaction, then issue
 create table, alter table, etc. ad nauseum, and in the mean time
 concurrent transactions will just work. Beautiful for atomically
 upgrading a production server. Oracle, of course, commits after each
 DDL statements.
 
 If this is such a rare feature, I'm very glad we chose postgresql.  
 I use it all
 the time, and wouldn't know what to do without it. We circumvented  
 Ruby on
 Rails' migrations, and just implemented them in SQL. Writing  
 migrations is a
 breeze this way, and you don't have to hassle with atomicity, or  
 the pain when
 you discover the migration doesn't work on the production server.
 
 Indeed. Wouldn't it be a cool feature to persists transaction states  
 across connections so that a new connection could get access to a sub- 
 transaction state? That way, you could make your schema changes and  
 test them with any number of test clients (which designate the state  
 to connect with) and then you would commit when everything works.
 
 Unfortunately, the postgresql architecture wouldn't lend itself well  
 to this. Still, it seems like a basic extension of the notion of sub- 
 transactions.

Hmm, doesn't this Just Work with two-phase commit?

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 18.1, W 73º 13' 56.4
Oh, oh, las chicas galacianas, lo harán por las perlas,
¡Y las de Arrakis por el agua! Pero si buscas damas
Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread A.M.


On May 24, 2007, at 15:57 , Alvaro Herrera wrote:


A.M. wrote:


Indeed. Wouldn't it be a cool feature to persists transaction states
across connections so that a new connection could get access to a  
sub-

transaction state? That way, you could make your schema changes and
test them with any number of test clients (which designate the state
to connect with) and then you would commit when everything works.

Unfortunately, the postgresql architecture wouldn't lend itself well
to this. Still, it seems like a basic extension of the notion of sub-
transactions.


Hmm, doesn't this Just Work with two-phase commit?


2PC requires that the modifications already be in concrete. What I  
suggest is a method for a new connection to insert itself into an  
existing (sub-)transaction SQL stream, make changes, and commit to  
the root or parent transaction.


In the scenario where changes are pending, only one connection can  
test the changes- it must be the connection that opened the  
transaction. Concurrency issues cannot be tested before committing,  
for example.


The implementation could be as simple as decoupling connections from  
transactions- then a connection could make serialized requests to  
other backends. A proof-of-concept could certainly be cobbled  
together with pipes and pl/perl, but the real beef would be the  
ability to pass responsibility for transactions from one connection  
to another.


Cheers,
M

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Alexander Staubo

We're derailing the thread, but...

On 5/24/07, A.M. [EMAIL PROTECTED] wrote:

2PC requires that the modifications already be in concrete. What I
suggest is a method for a new connection to insert itself into an
existing (sub-)transaction SQL stream, make changes, and commit to
the root or parent transaction.


The problem with long-running transactions is that they need to avoid
locking the resources they touch. Short-running transactions are bad
enough as they stand -- until fairly recently (8.1? 8.2?), merely
inserting or updating a row that had a foreign-key reference to
another table row would lock the referenced row until the end of the
transaction, by issuing an implicit select ... for update.

Although a mechanism whereby multiple connections can share a single
session/transaction is probably easy to implement, using long-running
transactions to isolate DDL changes is not feasible at the moment
because PostgreSQL currently acquires an AccessExclusiveLock on the
modified table until the transaction ends, which means that concurrent
transactions would be blocked from even querying the table.

I don't know PostgreSQL's internals, so I can only postulate that this
locking occurs because PostgreSQL holds a single copy of the schema
and related bookeeping structures in memory.

Alexander.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread PFC



Indeed. Wouldn't it be a cool feature to persists transaction states
across connections so that a new connection could get access to a sub-
transaction state? That way, you could make your schema changes and
test them with any number of test clients (which designate the state
to connect with) and then you would commit when everything works.


	Actually you can hack this by starting your webserver with only 1 thread,  
use persistent connections, and disable all commits in the application.

But I'd call that a very hackish hack.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread A.M.


On May 24, 2007, at 18:12 , PFC wrote:




Indeed. Wouldn't it be a cool feature to persists transaction states
across connections so that a new connection could get access to a  
sub-

transaction state? That way, you could make your schema changes and
test them with any number of test clients (which designate the state
to connect with) and then you would commit when everything works.


	Actually you can hack this by starting your webserver with only 1  
thread, use persistent connections, and disable all commits in the  
application.

But I'd call that a very hackish hack.


Not really- then I can't use any transactions, in which case, I might  
as well use MySQL. I would like to be able to pass transaction state  
between connections.


-M


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Chris Browne
[EMAIL PROTECTED] (A.M.) writes:
 On May 24, 2007, at 14:29 , Wiebe Cazemier wrote:

 On Thursday 24 May 2007 17:30, Alexander Staubo wrote:

 [2] Nobody else has this, I believe, except possibly Ingres and
 NonStop SQL. This means you can do a begin transaction, then issue
 create table, alter table, etc. ad nauseum, and in the mean time
 concurrent transactions will just work. Beautiful for atomically
 upgrading a production server. Oracle, of course, commits after each
 DDL statements.

 If this is such a rare feature, I'm very glad we chose postgresql.
 I use it all
 the time, and wouldn't know what to do without it. We circumvented
 Ruby on
 Rails' migrations, and just implemented them in SQL. Writing
 migrations is a
 breeze this way, and you don't have to hassle with atomicity, or
 the pain when
 you discover the migration doesn't work on the production server.

 Indeed. Wouldn't it be a cool feature to persists transaction states
 across connections so that a new connection could get access to a sub-
 transaction state? That way, you could make your schema changes and
 test them with any number of test clients (which designate the state
 to connect with) and then you would commit when everything works.

 Unfortunately, the postgresql architecture wouldn't lend itself well
 to this. Still, it seems like a basic extension of the notion of sub-
 transactions.

Jan Wieck had a proposal to a similar effect, namely to give some way
to get one connection to duplicate the state of another one.

This would permit doing a neat parallel decomposition of pg_dump: you
could do a 4-way parallelization of it that would function something
like the following:

- connection 1 opens, establishes the usual serialized mode transaction

- connection 1 dumps the table metadata into one or more files in a
  specified directory

- then it forks 3 more connections, and seeds them with the same
  serialized mode state

- it then goes thru and can dump 4 tables concurrently at a time,
  one apiece to a file in the directory.

This could considerably improve speed of dumps, possibly of restores,
too.

Note that this isn't related to subtransactions...
-- 
output = reverse(ofni.secnanifxunil @ enworbbc)
http://www3.sympatico.ca/cbbrowne/internet.html
Unless  there really are  chrono-synclastic infundibula. (Roll on the
Church of God the Utterly Indifferent...) -- Will Deakin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Thomas Kellerer

Alexander Staubo wrote on 24.05.2007 17:30:

[2] Nobody else has this, I believe, except possibly Ingres and
NonStop SQL. This means you can do a begin transaction, then issue
create table, alter table, etc. ad nauseum, and in the mean time
concurrent transactions will just work. Beautiful for atomically
upgrading a production server. Oracle, of course, commits after each
DDL statements.


I do have to commit a CREATE TABLE in SQL Server (Express) 2005 (and I believe 
in 2000 as well), and I can rollback a DROP TABLE.

I haven't checked how this behaves with concurrent access though.

Thomas




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Alvaro Herrera
Chris Browne wrote:
 [EMAIL PROTECTED] (A.M.) writes:

 Jan Wieck had a proposal to a similar effect, namely to give some way
 to get one connection to duplicate the state of another one.
 
 This would permit doing a neat parallel decomposition of pg_dump: you
 could do a 4-way parallelization of it that would function something
 like the following:

Interesting ...

 Note that this isn't related to subtransactions...

I think what needs to happen is that the snapshot is duplicated.
Probably not that hard to do actually ... the difficult thing is coming
up with an API that makes sense.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo (G. Lama)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread A.M.


On May 24, 2007, at 18:21 , Chris Browne wrote:


Jan Wieck had a proposal to a similar effect, namely to give some way
to get one connection to duplicate the state of another one.

This would permit doing a neat parallel decomposition of pg_dump: you
could do a 4-way parallelization of it that would function something
like the following:

- connection 1 opens, establishes the usual serialized mode  
transaction


- connection 1 dumps the table metadata into one or more files in a
  specified directory

- then it forks 3 more connections, and seeds them with the same
  serialized mode state

- it then goes thru and can dump 4 tables concurrently at a time,
  one apiece to a file in the directory.

This could considerably improve speed of dumps, possibly of restores,
too.

Note that this isn't related to subtransactions...


Well, I was thinking that since transactions are now serializable, it  
should be possible to move the state between existing open transactions.


-M

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Parallel backups (was Re: [GENERAL] why postgresql over other RDBMS)

2007-05-24 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/24/07 17:21, Chris Browne wrote:
[snip]
 
 This would permit doing a neat parallel decomposition of pg_dump: you
 could do a 4-way parallelization of it that would function something
 like the following:
 
 - connection 1 opens, establishes the usual serialized mode transaction
 
 - connection 1 dumps the table metadata into one or more files in a
   specified directory
 
 - then it forks 3 more connections, and seeds them with the same
   serialized mode state
 
 - it then goes thru and can dump 4 tables concurrently at a time,
   one apiece to a file in the directory.
 
 This could considerably improve speed of dumps, possibly of restores,
 too.

What about a master thread that establishes the usual serialized
mode transaction and then issues N asynchronous requests to the
database, and as they return with data, pipe the data to N number of
corresponding writer threads.  Matching N to the number of tape
drives comes to mind.

Yes, the master thread would be the choke point, but CPUs and RAM
are still a heck of a lot faster than disks, so maybe it wouldn't be
such a problem after all.

Of course, if libpq(??) doesn't handle async IO, then it's not such
a good idea after all.

 Note that this isn't related to subtransactions...

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGViC4S9HxQb37XmcRAgkAAKC4pyZQWDF01S17uITbOkcj+KY8lgCg40pi
2B3xg2tnp554GGP0VsgACWE=
=eIUP
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread Andrew Sullivan
On Thu, May 24, 2007 at 03:25:52PM -0400, A.M. wrote:
 Wouldn't it be a cool feature to persists transaction states  
 across connections so that a new connection could get access to a sub- 
 transaction state? 

You could do this using an incredibly evil, carefully implemented
hack in a connection pool.  I'm shuddering at the thought of it, to
be honest, so details are left as an exervisse for the reader.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] why postgresql over other RDBMS

2007-05-24 Thread A.M.


On May 24, 2007, at 20:39 , Andrew Sullivan wrote:


On Thu, May 24, 2007 at 03:25:52PM -0400, A.M. wrote:

Wouldn't it be a cool feature to persists transaction states
across connections so that a new connection could get access to a  
sub-

transaction state?


You could do this using an incredibly evil, carefully implemented
hack in a connection pool.  I'm shuddering at the thought of it, to
be honest, so details are left as an exervisse for the reader.


Actually, a sample implementation could be done using stored  
procedures and some IPC. It would however require that the receiver  
poll for requests- the API would probably look very similar to dblink.


-M

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/