Re: [HACKERS] savepoint improvements

2007-01-22 Thread Simon Riggs
On Sun, 2007-01-21 at 13:28 -0500, Tom Lane wrote:
 Jaime Casanova [EMAIL PROTECTED] writes:
  On 1/21/07, Simon Riggs [EMAIL PROTECTED] wrote:
  - continue on error i.e. COMMIT can/might succeed - though there are
  still cases where it cannot, such as a serializable exception.
 
  and what should be the behaviour of that? the same as rollback?

No. The behaviour is to continue the transaction even though an error
has occurred, i.e.

BEGIN;

1. INSERT...
success

2. INSERT  VALUES () () ()
--fails with error on 3rd VALUES statement

dynamically re-construct INSERT statement with remaining 2 VALUES
statements

3. INSERT VALUES () ();
success

COMMIT;
work done by 1 and 3 is committed

Behaviour needs to support any error at (2) except serializable
exceptions.

 The only conceivable implementation is an implicit savepoint issued
 before each statement.  

Perhaps the only acceptable one.

 By and large that seems to me to be most easily
 handled on the client side, and many of our client libraries already
 have the ability to do it. 

PL/pgSQL supports EXCEPTIONs, but no other clients support it, AFAICS.

  (For instance, psql has ON_ERROR_ROLLBACK.)

Thats not the same thing, regrettably.

 If we tried to do it on the server side, we would break any client
 software that wasn't prepared for the change of behavior --- see the 7.3
 autocommit fiasco for an example.

Only if we changed the default behaviour, which I am not suggesting.

 So as far as the server is concerned, I see no TODO here.

If the server team won't allow it, we must document that this behaviour
must be a client-side function in the *server* TODO, so that all the
various client projects can read the same TODO item and implement it.

Implement continue-on-error transactional behaviour for each client
library.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Simon Riggs
On Sun, 2007-01-21 at 14:26 -0600, Jim C. Nasby wrote:
 On Sun, Jan 21, 2007 at 11:39:45AM +, Heikki Linnakangas wrote:
  Russell Smith wrote:
  Strange idea that I haven't researched,  Given Vacuum can't be run in a 
  transaction, it is possible at a certain point to quit the current 
  transaction and start another one.  There has been much chat and now a 
  TODO item about allowing multiple vacuums to not starve small tables.  
  But if a big table has a long running vacuum the vacuum of the small 
  table won't be effective anyway will it?  If vacuum of a big table was 
  done in multiple transactions you could reduce the effect of long 
  running vacuum.  I'm not sure how this effects the rest of the system 
  thought.
  
  That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to 
  ignore other vacuums in the oldest xmin calculation.
 
 And IIRC in 8.1 every time vacuum finishes a pass over the indexes it
 will commit and start a new transaction. 

err...It doesn't do this now and IIRC didn't do that in 8.1 either.

 That's still useful even with
 Hannu's patch in case you start a vacuum with maintenance_work_mem too
 small; you can abort the vacuum some time later and at least some of the
 work it's done will get committed.

True, but not recommended, though for a variety of reasons.

The reason is not intermediate commits, but just that the work of VACUUM
is mostly non-transactional in nature, apart from the various catalog
entries when it completes.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Csaba Nagy
On Sat, 2007-01-20 at 18:08, Merlin Moncure wrote:
[snip]
 To be honest, I'm not a huge fan of psql tricks (error recovery being
 another example)  but this could provide a solution.  in your opnion,
 how would you use \if to query the transaction state?

Wouldn't it make sense to introduce instead something like:

\set language plpgsql
... and then redirect to plpgsql all you type ?

That would give you the possibility to execute things in your favorite
language directly from psql without creating a function.

Cheers,
Csaba.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] STOP all user access except for admin for a few minutes?

2007-01-22 Thread org

Hi there,

Is is possible to stop all user access to postgres, but still give access to 
admin?

Just temporarily, not a security setup.

Something like, stop all users but allow user x and y.

thx 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] STOP all user access except for admin for a few minutes?

2007-01-22 Thread Russell Smith

[EMAIL PROTECTED] wrote:

Hi there,

Is is possible to stop all user access to postgres, but still give 
access to admin?

Just temporarily, not a security setup.

Something like, stop all users but allow user x and y.
You could restart in single user mode, or alter pg_hba.conf to allow the 
users you want and disallow all other users.


single user mode will require you have direct access to the machine to 
do the alterations.


using pg_hba.conf will not disconnect existing users as far as I'm aware.

That's the best advice I can offer, maybe somebody else will be able to 
give you more


thx

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] STOP all user access except for admin for a few minutes?

2007-01-22 Thread Peter Eisentraut
Am Montag, 22. Januar 2007 10:32 schrieb [EMAIL PROTECTED]:
 Is is possible to stop all user access to postgres, but still give access
 to admin?

Make the appropriate adjustments to pg_hba.conf.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Strange file in snapshot tarball

2007-01-22 Thread ITAGAKI Takahiro
Hello,

There is a file 'compat_informix-dec_test-OpenBSD3.8.broken.stdout'
under the path 'src/interfaces/ecpg/test/extedted/' in the recent
ftp snapshot (postgresql-snapshot.tar.gz).

All of the other files are placed under 'postgresql-snapshot/...'.
Is this intentional or a mistake?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum.  So doing a 
partial vacuum actually means more I/O as you have to do index cleanup 
more often.


I don't think that's usually the case. Index(es) are typically only a 
fraction of the size of the table, and since 8.2 we do index vacuums in 
a single scan in physical order. In fact, in many applications the index 
is be mostly cached and the index scan doesn't generate any I/O at all.


I believe the heap scans are the biggest issue at the moment.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Strange file in snapshot tarball

2007-01-22 Thread Michael Meskes
On Mon, Jan 22, 2007 at 08:00:26PM +0900, ITAGAKI Takahiro wrote:
 There is a file 'compat_informix-dec_test-OpenBSD3.8.broken.stdout'
 under the path 'src/interfaces/ecpg/test/extedted/' in the recent
 ftp snapshot (postgresql-snapshot.tar.gz).
 
 All of the other files are placed under 'postgresql-snapshot/...'.
 Is this intentional or a mistake?

Would you please care to explain? I do not understand what you are
saying. 

[EMAIL PROTECTED]:~$ tar -ztvf postgresql-snapshot.tar.gz |grep OpenBSD
-rw-r--r-- pgsql/pgsql   32004 2007-01-12 12:31 
postgresql-snapshot/src/interfaces/ecpg/test/expected/compat_informix-dec_test-OpenBSD3.8.broken.stdout
-rw-r--r-- pgsql/pgsql   81524 2007-01-12 12:31 
postgresql-snapshot/src/interfaces/ecpg/test/expected/pgtypeslib-num_test2-OpenBSD3.8.broken.stdout

I don't see the difference here.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/22/07, Csaba Nagy [EMAIL PROTECTED] wrote:

On Sat, 2007-01-20 at 18:08, Merlin Moncure wrote:
[snip]
 To be honest, I'm not a huge fan of psql tricks (error recovery being
 another example)  but this could provide a solution.  in your opnion,
 how would you use \if to query the transaction state?

Wouldn't it make sense to introduce instead something like:

\set language plpgsql
... and then redirect to plpgsql all you type ?

That would give you the possibility to execute things in your favorite
language directly from psql without creating a function.


The nature of pl/pgsql would make this impossible, or at least highly
complex and difficult...one reason is that the language has a much
more complex internal state than sql.  Most other languages that I
think this would be worthwhile already their own immediate execution
interpreters.

merlin

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] Piggybacking vacuum I/O

2007-01-22 Thread Heikki Linnakangas

I've been looking at the way we do vacuums.

The fundamental performance issue is that a vacuum generates 
nheapblocks+nindexblocks+ndirtyblocks I/Os. Vacuum cost delay helps to 
spread the cost like part payment, but the total is the same. In an I/O 
bound system, the extra I/O directly leads to less throughput.


Therefore, we need to do less I/O. Dead space map helps by allowing us 
to skip blocks that don't need vacuuming, reducing the # of I/Os to 
2*ndirtyblocks+nindexblocks. That's great, but it doesn't help us if the 
dead tuples are spread uniformly.


If we could piggyback the vacuum I/Os to the I/Os that we're doing 
anyway, vacuum wouldn't ideally have to issue any I/O of its own. I've 
tried to figure out a way to do that.


Vacuum is done in 3 phases:

1. Scan heap
2. Vacuum index
3. Vacuum heap

Instead of doing a sequential scan, we could perform the 1st phase by 
watching the buffer pool, scanning blocks for dead tuples when they're 
in memory and keeping track of which pages we've seen. When all pages 
have been seen, the tid list is sorted and 1st phase is done.


In theory, the index vacuum could also be done that way, but let's 
assume for now that indexes would be scanned like they are currently.


The 3rd phase can be performed similarly to the 1st phase. Whenever a 
page enters the buffer pool, we check the tid list and remove any 
matching tuples from the page. When the list is empty, vacuum is complete.


Of course, there's some issues in the design as described above. For 
example, the vacuum might take a long time if there's cold spots in the 
table. In fact, a block full of dead tuples might never be visited again.


A variation of the scheme would be to keep scanning pages that are in 
cache, until the tid list reaches a predefined size, instead of keeping 
track of which pages have already been seen. That would deal better with 
tables with hot and cold spots, but it couldn't advance the relfrozenid 
because there would be no guarantee that all pages are visited. Also, we 
could start 1st phase of the next vacuum, while we're still in the 3rd 
phase of previous one.


Also, after we've seen 95% of the pages or a timeout expires, we could 
fetch the rest of them with random I/O to let the vacuum finish.


I'm not sure how exactly this would be implemented. Perhaps bgwriter or 
autovacuum would do it, or a new background process. Presumably the 
process would need access to relcache.


One issue is that if we're trying to vacuum every table simultaneously 
this way, we'll need more overall memory for the tid lists. I'm hoping 
there's a way to implement this without requiring shared memory for the 
tid lists, that would make the memory management a nightmare. Also, we'd 
need changes to bufmgr API to support this.


This would work nicely with the DSM. The list of pages that need to be 
visited in phase 1 could be initialized from the DSM, largely avoiding 
the problem with cold spots.


Any thoughts before I start experimenting?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/21/07, Tom Lane [EMAIL PROTECTED] wrote:

Jaime Casanova [EMAIL PROTECTED] writes:
 On 1/21/07, Simon Riggs [EMAIL PROTECTED] wrote:
 - continue on error i.e. COMMIT can/might succeed - though there are
 still cases where it cannot, such as a serializable exception.

 and what should be the behaviour of that? the same as rollback?

The only conceivable implementation is an implicit savepoint issued
before each statement.


I'm not sure I agree here...before the NT implementation was changed
over to savepoint syntax it was perfectly possible to recover from
errors inside a  transaction...and is still possible in plpgsql
functions only.  What I'm asking for is to reopen this behavior
somehow...in the production environments I've worked in application
update and maintenance relied heavily on scripting, and lack of this
functionality forces me to wrap the script launch with C code to work
around limitations of the savepoint system.

In pure SQL, we have a 'begin' statement equivalent but no 'end'
statement.  Why not?

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Simon Riggs
On Mon, 2007-01-22 at 09:25 -0500, Merlin Moncure wrote:
 On 1/21/07, Tom Lane [EMAIL PROTECTED] wrote:
  Jaime Casanova [EMAIL PROTECTED] writes:
   On 1/21/07, Simon Riggs [EMAIL PROTECTED] wrote:
   - continue on error i.e. COMMIT can/might succeed - though there are
   still cases where it cannot, such as a serializable exception.
 
   and what should be the behaviour of that? the same as rollback?
 
  The only conceivable implementation is an implicit savepoint issued
  before each statement.
 
 I'm not sure I agree here...before the NT implementation was changed
 over to savepoint syntax it was perfectly possible to recover from
 errors inside a  transaction...and is still possible in plpgsql
 functions only.  What I'm asking for is to reopen this behavior
 somehow...in the production environments I've worked in application
 update and maintenance relied heavily on scripting, and lack of this
 functionality forces me to wrap the script launch with C code to work
 around limitations of the savepoint system.

Could you post an example, just so we're all clear what the problems
are? I thought I understood what you are requesting; I may not.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/22/07, Simon Riggs [EMAIL PROTECTED] wrote:

Could you post an example, just so we're all clear what the problems
are? I thought I understood what you are requesting; I may not.


ok,

The short version is I would like the ability to run some sql commands
and recover the transaction if an error occurs.

We have the ability to do this with savepoint...rollback to
savepoint...but these are not useful without introducing an external
language (c,perl) that can catch the errors and do a rollback to a
savepoint conditionally on the sql error state.

How would this be useful?
Well when I update production systems I often do this from a master
script that loads smaller scripts from another place:

-- update_production.sql
begin;
\i update_foo.sql
\i update_bar.sql
commit;

any error updating foo or bar will blow up the whole thing.  Maybe
this is desirable, but it is often nice to be able to do some error
handling here.  In the pre-savepoint NT implementation I could:

-- update_production.sql
begin;

begin;
insert into log values ('foo');
\i update_foo.sql
commit;

begin;
insert into log values ('bar');
\i update_bar.sql
commit;

commit;

In between the inner transactions I could check 'log' to see if
everything went through and take appropriate action.  Now client
applications have the luxury of being able to check the return code of
the query execution call, but SQL only scripts can't.

This would be perfectly acceptable:

-- update_production.sql
begin;

savepoint foo;
\i update_foo.sql
rollback to savepoint foo [if I failed only];

savepoint bar;
\i update_bar.sql
rollback to savepoint foo [if I failed only];

commit;

This would be just great for scripts but would also help client side
programming a bit by introducing more flexible error handling
behaviors without having to handle things via the returned sql error
code.  The on errors bit I was talking about earlier is just syntax
sugar but the critical part is being able to recover transactions
partially without external handler...

merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Simon Riggs
There is currently a performance tip to run ANALYZE after a pg_dump
script has been restored.

ISTM that pg_dump should default to have ANALYZE statements for a table,
when both table definition and data are dumped. If only data (-a) or
only table definition (-s) is dumped the default should be to *not* add
an ANALYZE statement.

There doesn't seem to be any reason to skip the ANALYZE, but I'll
implement it as an option. 
-z on | off 
--analyze=on | off

This would add a table-specific ANALYZE statement following each table's
actions.

I'm not aware of a strong argument against such an option. Performance
surely can't be one because the time saved on the ANALYZE will quickly
bite back on time lost on poorly planned queries.

What does the panel think?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 On 1/22/07, Simon Riggs [EMAIL PROTECTED] wrote:
 Could you post an example, just so we're all clear what the problems
 are? I thought I understood what you are requesting; I may not.

 ok,

 The short version is I would like the ability to run some sql commands
 and recover the transaction if an error occurs.

I'm getting tired of repeating this, but: neither of you have said
anything that doesn't appear to me to be handled by ON_ERROR_ROLLBACK.
What exactly is lacking in that feature?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Martijn van Oosterhout
On Mon, Jan 22, 2007 at 10:40:37AM -0500, Merlin Moncure wrote:
 The short version is I would like the ability to run some sql commands

snip

 any error updating foo or bar will blow up the whole thing.  Maybe
 this is desirable, but it is often nice to be able to do some error
 handling here.  In the pre-savepoint NT implementation I could:

snip

Nested transactions are trivially implemented on top of savepoints. If
we're talking about psql, maybe all we need to do is create the
commands in psql:

\begin_nest
\commit_nest
\rollback_nest

Would that suit your purpose?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 ISTM that pg_dump should default to have ANALYZE statements for a table,
 when both table definition and data are dumped. If only data (-a) or
 only table definition (-s) is dumped the default should be to *not* add
 an ANALYZE statement.

Having pg_dump emit ANALYZE was discussed and rejected years ago.
Have you read that discussion?  Do you have any new arguments to make?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Stefan Kaltenbrunner

Simon Riggs wrote:

There is currently a performance tip to run ANALYZE after a pg_dump
script has been restored.

ISTM that pg_dump should default to have ANALYZE statements for a table,
when both table definition and data are dumped. If only data (-a) or
only table definition (-s) is dumped the default should be to *not* add
an ANALYZE statement.

There doesn't seem to be any reason to skip the ANALYZE, but I'll
implement it as an option. 
-z on | off 
--analyze=on | off


This would add a table-specific ANALYZE statement following each table's
actions.

I'm not aware of a strong argument against such an option. Performance
surely can't be one because the time saved on the ANALYZE will quickly
bite back on time lost on poorly planned queries.

What does the panel think?


how is this going to interact with the (now by default enabled) 
autovacuum daemon ?



Stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Gregory Stark

Simon Riggs [EMAIL PROTECTED] writes:

 There doesn't seem to be any reason to skip the ANALYZE, but I'll
 implement it as an option. 
 -z on | off 
 --analyze=on | off

Only an aesthetic comment: 

Short options don't usually take on/off arguments, I would suggest making the
default be to analyze and make -z and --analyze=off disable the analyze.

You might also consider having a --analyze=verbose and perhaps a
--analyze=full though currently that would require doing vacuum analyze.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Simon Riggs
On Mon, 2007-01-22 at 10:46 -0500, Tom Lane wrote:
 Merlin Moncure [EMAIL PROTECTED] writes:
  On 1/22/07, Simon Riggs [EMAIL PROTECTED] wrote:
  Could you post an example, just so we're all clear what the problems
  are? I thought I understood what you are requesting; I may not.
 
  ok,
 
  The short version is I would like the ability to run some sql commands
  and recover the transaction if an error occurs.
 
 I'm getting tired of repeating this, but: neither of you have said
 anything that doesn't appear to me to be handled by ON_ERROR_ROLLBACK.
 What exactly is lacking in that feature?

Sorry for not replying to your other post.

ON_ERROR_ROLLBACK doesn't do the same thing, thats why. It shuts out the
noise messages, true, but it doesn't re-execute all of the commands in
the transaction that succeeded and so breaks the transaction, as
originally coded.

BEGIN;
stmt1;
stmt2; -- error
stmt3;
COMMIT;

results in stmt3 completing successfully even though stmt1 and stmt2 do
not == broken script.

The behaviour we've been discussing is when stmt2 fails, to allow stmt3
to be submitted, so that at commit, stmt1 and stmt3 effects will be
successful *if* the user wishes this.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian
Heikki Linnakangas wrote:
 Russell Smith wrote:
  2. Index cleanup is the most expensive part of vacuum.  So doing a 
  partial vacuum actually means more I/O as you have to do index cleanup 
  more often.
 
 I don't think that's usually the case. Index(es) are typically only a 
 fraction of the size of the table, and since 8.2 we do index vacuums in 
 a single scan in physical order. In fact, in many applications the index 
 is be mostly cached and the index scan doesn't generate any I/O at all.

Are _all_ the indexes cached?  I would doubt that.  Also, for typical
table, what percentage is the size of all indexes combined?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Bruce Momjian wrote:

Heikki Linnakangas wrote:

Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum.  So doing a 
partial vacuum actually means more I/O as you have to do index cleanup 
more often.
I don't think that's usually the case. Index(es) are typically only a 
fraction of the size of the table, and since 8.2 we do index vacuums in 
a single scan in physical order. In fact, in many applications the index 
is be mostly cached and the index scan doesn't generate any I/O at all.


Are _all_ the indexes cached?  I would doubt that.


Well, depends on your schema, of course. In many applications, yes.


 Also, for typical
table, what percentage is the size of all indexes combined?


Well, there's no such thing as a typical table. As an anecdote here's 
the ratios (total size of all indexes of a table)/(size of corresponding 
heap) for the bigger tables for a DBT-2 run I have at hand:


Stock:  1190470/68550 = 6%
Order_line: 950103/274372 = 29%
Customer:   629011 /(5711+20567) = 8%

In any case, for the statement Index cleanup is the most expensive part 
of vacuum to be true, you're indexes would have to take up 2x as much 
space as the heap, since the heap is scanned twice. I'm sure there's 
databases like that out there, but I don't think it's the common case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 BEGIN;
 stmt1;
 stmt2; -- error
 stmt3;
 COMMIT;

 results in stmt3 completing successfully even though stmt1 and stmt2 do
 not == broken script.

stmt1 would still be completed successfully.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Simon Riggs
On Mon, 2007-01-22 at 10:49 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  ISTM that pg_dump should default to have ANALYZE statements for a table,
  when both table definition and data are dumped. If only data (-a) or
  only table definition (-s) is dumped the default should be to *not* add
  an ANALYZE statement.
 
 Having pg_dump emit ANALYZE was discussed and rejected years ago.
 Have you read that discussion?  Do you have any new arguments to make?

Well, its been suggested before, but I can't see any good arguments
against. Specifically, there was one person who spoke in favour of it
last time it was mentioned.

http://archives.postgresql.org/pgsql-hackers/2003-02/msg01270.php

If its a performance tip, we should be doing it automatically. If we
genuinely believe that autovacuum will handle it, then we can simply
alter the docs to say: if you aren't running autovacuum, then don't
forget to ANALYZE. IMHO it is not sufficient to rely upon autovacuum to
do all of this work for us.

Rod mentions the discussion has been raised before but doesn't state
what the arguments were:
http://archives.postgresql.org/pgsql-hackers/2003-02/msg01264.php

These other posts also seem to be in favour of the idea...
http://archives.postgresql.org/pgsql-performance/2006-10/msg00142.php
http://archives.postgresql.org/pgsql-hackers/2003-02/msg01273.php

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/22/07, Martijn van Oosterhout kleptog@svana.org wrote:

we're talking about psql, maybe all we need to do is create the
commands in psql:

\begin_nest
\commit_nest
\rollback_nest


That would work if we could rollback conditionally on failure (like
on_error_rollback but with definable beginning and ending points).  I
still think we are hacking around limitations of savepoints but it
would solve the scripting problem at least.  A general implementation
on the server would benefit everybody.

merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Simon Riggs
On Mon, 2007-01-22 at 16:11 +, Gregory Stark wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
 
  BEGIN;
  stmt1;
  stmt2; -- error
  stmt3;
  COMMIT;
 
  results in stmt3 completing successfully even though stmt1 and stmt2 do
  not == broken script.
 
 stmt1 would still be completed successfully.

OK, understood. ON_ERROR_ROLLBACK is what we need, for psql only.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/22/07, Tom Lane [EMAIL PROTECTED] wrote:

I'm getting tired of repeating this, but: neither of you have said
anything that doesn't appear to me to be handled by ON_ERROR_ROLLBACK.
What exactly is lacking in that feature?


* the ability to span the savepoint across multiple statements.
* the ability to get what you want without wastefully creating a
savepoint before every statement.
* losing some behavior which (IMO) is general and beneficial.  how do
psql tricks help proper stored procedures should we aver get them?

That being said, some simple extensions to the psql rollback feature
would get the job done I guess.  I'm still not happy with it but I
knew it was a tough go from the beginning...I appreciate everyone's
comments.

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Martijn van Oosterhout
On Mon, Jan 22, 2007 at 11:21:12AM -0500, Merlin Moncure wrote:
 \begin_nest
 \commit_nest
 \rollback_nest
 
 That would work if we could rollback conditionally on failure (like
 on_error_rollback but with definable beginning and ending points).  I

Sorry, rollback conditionally on failure isn't parsing for me. Can
you give some example of what you mean?

 still think we are hacking around limitations of savepoints but it
 would solve the scripting problem at least.  A general implementation
 on the server would benefit everybody.

I don't understand this either. Everything you can do with nested
transactions you can also do with savepoints, so I'm really not
understand what the limitations are?

Actually, looking at the savepoint documentation, it looks like there
is no way to say:

if transaction_state ok then
  release X
else
  rollback to X

Which is what a normal COMMIT does (sort of). This is very irritating
for scripting, so maybe a COMMIT TO X command would be auseful
addition?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Dawid Kuroczko

On 1/22/07, Tom Lane [EMAIL PROTECTED] wrote:

 The short version is I would like the ability to run some sql commands
 and recover the transaction if an error occurs.

I'm getting tired of repeating this, but: neither of you have said
anything that doesn't appear to me to be handled by ON_ERROR_ROLLBACK.
What exactly is lacking in that feature?


I think the problem is with doing something like this:

BEGIN;
INSERT INTO foo VALUES ('1'); UPDATE status SET updated=now() WHERE tab='foo';
INSERT INTO bar VALUES ('2'); UPDATE status SET updated=now() WHERE tab='bar';
INSERT INTO baz VALUES ('3'); UPDATE status SET updated=now() WHERE tab='baz';
COMMIT;

This will issue three savepoints (if I understand how things wok correctly),
one for each INSERT+UPDATE block.  This way eiher both of them succeed
or fail, within one transaction.

Now, I think the problem the OP wanted to solve was that keeping command
on one line just to have them inside one savepoint, and depending on psql(1)
to issue rollbacks for us.  I think OPs idea was to be able to rollback if error
occured:
BEGIN;
SAVEPOINT s1;
INSERT...
UPDATE...
ROLLBACK TO s1 ON ERROR;
INSERT..
UPDATE...
ROLLBACK TO s2 ON ERROR;
UPDATE job SET ts = now(); -- OK
COMMIT; -- notice lack of rollback -- whole transaction will fail on error

One solution would be a psql command which
would fire given command on error condition, like:
BEGIN;
SAVEPOINT s1;
INSERT...
UPDATE...
\on_error ROLLBACK TO s1; INSERT INTO errors 
SAVEPOINT s2;

COMMIT;

  Regards,
 Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian
Heikki Linnakangas wrote:
 Bruce Momjian wrote:
  Heikki Linnakangas wrote:
  Russell Smith wrote:
  2. Index cleanup is the most expensive part of vacuum.  So doing a 
  partial vacuum actually means more I/O as you have to do index cleanup 
  more often.
  I don't think that's usually the case. Index(es) are typically only a 
  fraction of the size of the table, and since 8.2 we do index vacuums in 
  a single scan in physical order. In fact, in many applications the index 
  is be mostly cached and the index scan doesn't generate any I/O at all.
  
  Are _all_ the indexes cached?  I would doubt that.
 
 Well, depends on your schema, of course. In many applications, yes.
 
   Also, for typical
  table, what percentage is the size of all indexes combined?
 
 Well, there's no such thing as a typical table. As an anecdote here's 
 the ratios (total size of all indexes of a table)/(size of corresponding 
 heap) for the bigger tables for a DBT-2 run I have at hand:
 
 Stock:1190470/68550 = 6%
 Order_line:   950103/274372 = 29%
 Customer: 629011 /(5711+20567) = 8%
 
 In any case, for the statement Index cleanup is the most expensive part 
 of vacuum to be true, you're indexes would have to take up 2x as much 
 space as the heap, since the heap is scanned twice. I'm sure there's 
 databases like that out there, but I don't think it's the common case.

I agree it index cleanup isn't  50% of vacuum.  I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Simon Riggs
On Mon, 2007-01-22 at 12:18 -0500, Bruce Momjian wrote:
 Heikki Linnakangas wrote:
  
  In any case, for the statement Index cleanup is the most expensive part 
  of vacuum to be true, you're indexes would have to take up 2x as much 
  space as the heap, since the heap is scanned twice. I'm sure there's 
  databases like that out there, but I don't think it's the common case.
 
 I agree it index cleanup isn't  50% of vacuum.  I was trying to figure
 out how small, and it seems about 15% of the total table, which means if
 we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
 80%, assuming 5% of the table is scanned.

Clearly keeping track of what needs vacuuming will lead to a more
efficient VACUUM. Your math applies to *any* design that uses some form
of book-keeping to focus in on the hot spots.

On a separate thread, Heikki has raised a different idea for VACUUM.

Heikki's idea asks an important question: where and how should DSM
information be maintained? Up to now everybody has assumed that it would
be maintained when DML took place and that the DSM would be a
transactional data structure (i.e. on-disk). Heikki's idea requires
similar bookkeeping requirements to the original DSM concept, but the
interesting aspect is that the DSM information is collected off-line,
rather than being an overhead on every statement's response time.

That idea seems extremely valuable to me.

One of the main challenges is how we cope with large tables that have a
very fine spray of updates against them. A DSM bitmap won't help with
that situation, regrettably.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 I agree it index cleanup isn't  50% of vacuum.  I was trying to figure
 out how small, and it seems about 15% of the total table, which means if
 we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
 80%, assuming 5% of the table is scanned.

Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.

That is, if you assum that only 5% of the table will be scanned and you
arrange to do it sequentially then you should expect the i/o to be marginally
faster than just reading the entire table. Vacuum does do some cpu work and
wouldn't have to consult the clog as often, so it would still be somewhat
faster.

The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.
I also had some theories involving linux being confused by the seeks and
turning off read-ahead but I could never prove them.

In short, to see big benefits you would have to have a much smaller percentage
of the table being read. That shouldn't be taken to mean that the DSM is a
loser. There are plenty of use cases where tables can be extremely large and
have only very small percentages that are busy. The big advantage of the DSM
is that it takes the size of the table out of the equation and replaces it
with the size of the busy portion of the table. So updating a single record in
a terabyte table has the same costs as updating a single record in a kilobyte
table.

Sadly that's not quite true due to indexes, and due to the size of the bitmap
itself. But going back to your numbers it does mean that if you update a
single row out of a terabyte table then we'll be removing about 85% of the i/o
(minus the i/o needed to read the DSM, about .025%). If you update about 1%
then you would be removing substantially less, and once you get to about 10%
then you're back where you started.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian

Yep, agreed on the random I/O issue.  The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once.  And
vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.

---

Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  I agree it index cleanup isn't  50% of vacuum.  I was trying to figure
  out how small, and it seems about 15% of the total table, which means if
  we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
  80%, assuming 5% of the table is scanned.
 
 Actually no. A while back I did experiments to see how fast reading a file
 sequentially was compared to reading the same file sequentially but skipping
 x% of the blocks randomly. The results were surprising (to me) and depressing.
 The breakeven point was about 7%.
 
 That is, if you assum that only 5% of the table will be scanned and you
 arrange to do it sequentially then you should expect the i/o to be marginally
 faster than just reading the entire table. Vacuum does do some cpu work and
 wouldn't have to consult the clog as often, so it would still be somewhat
 faster.
 
 The theory online was that as long as you're reading one page from each disk
 track you're going to pay the same seek overhead as reading the entire track.
 I also had some theories involving linux being confused by the seeks and
 turning off read-ahead but I could never prove them.
 
 In short, to see big benefits you would have to have a much smaller percentage
 of the table being read. That shouldn't be taken to mean that the DSM is a
 loser. There are plenty of use cases where tables can be extremely large and
 have only very small percentages that are busy. The big advantage of the DSM
 is that it takes the size of the table out of the equation and replaces it
 with the size of the busy portion of the table. So updating a single record in
 a terabyte table has the same costs as updating a single record in a kilobyte
 table.
 
 Sadly that's not quite true due to indexes, and due to the size of the bitmap
 itself. But going back to your numbers it does mean that if you update a
 single row out of a terabyte table then we'll be removing about 85% of the i/o
 (minus the i/o needed to read the DSM, about .025%). If you update about 1%
 then you would be removing substantially less, and once you get to about 10%
 then you're back where you started.
 
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 Yep, agreed on the random I/O issue.  The larger question is if you have
 a huge table, do you care to reclaim 3% of the table size, rather than
 just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
 to take a lot of time, but vacuuming to relaim 3% three times seems like
 it is going to be more expensive than just vacuuming the 10% once.  And
 vacuuming to reclaim 1% ten times seems even more expensive.  The
 partial vacuum idea is starting to look like a loser to me again.

Well the answer is of course that depends.

If you maintain the dead space at a steady state averaging 1.5% instead of 5%
your table is 3.33% smaller on average. If this is a DSS system that will
translate into running your queries 3.33% faster. It will take a lot of
vacuums before they hurt more than a 3%+ performance drop.

If it's an OLTP system the it's harder to figure. a 3.33% increase in data
density will translate to a higher cache hit rate but how much higher depends
on a lot of factors. In our experiments we actually got bigger boost in these
kinds of situations than the I expected (I expected comparable to the 3.33%
improvement). So it could be even more than 3.33%. But like said it depends.
If you already have the whole database cache you won't see any improvement. If
you are right on the cusp you could see a huge benefit.

It sounds like you're underestimating the performance drain 10% wasted space
has. If we found out that one routine was unnecessarily taking 10% of the cpu
time it would be an obvious focus of attention. 10% wasted space is going to
work out to about 10% of the i/o time.

It also sounds like we're still focused on the performance impact in absolute
terms. I'm much more interested in changing the performance characteristics so
they're predictable and scalable. It doesn't matter much if your 1kb table is
100% slower than necessary but it does matter if your 1TB table needs 1,000x
as much vacuuming as your 1GB table even if it's getting the same update
traffic.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Simon Riggs
On Mon, 2007-01-22 at 13:27 -0500, Bruce Momjian wrote:
 Yep, agreed on the random I/O issue.  The larger question is if you have
 a huge table, do you care to reclaim 3% of the table size, rather than
 just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
 to take a lot of time, but vacuuming to relaim 3% three times seems like
 it is going to be more expensive than just vacuuming the 10% once.  And
 vacuuming to reclaim 1% ten times seems even more expensive.  The
 partial vacuum idea is starting to look like a loser to me again.

Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/22/07, Martijn van Oosterhout kleptog@svana.org wrote:

I don't understand this either. Everything you can do with nested
transactions you can also do with savepoints, so I'm really not
understand what the limitations are?

Actually, looking at the savepoint documentation, it looks like there
is no way to say:

if transaction_state ok then
  release X
else
  rollback to X


exactly.


Which is what a normal COMMIT does (sort of). This is very irritating
for scripting, so maybe a COMMIT TO X command would be auseful
addition?


right.  thats exactly what I want (more or less, there are a couple of
different ways to do it, but this is perfectly acceptable).  The on
errors bit was just a froofy addition that distracted from the core
problem.

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/22/07, Dawid Kuroczko [EMAIL PROTECTED] wrote:

On 1/22/07, Tom Lane [EMAIL PROTECTED] wrote:
  The short version is I would like the ability to run some sql commands
  and recover the transaction if an error occurs.

 I'm getting tired of repeating this, but: neither of you have said
 anything that doesn't appear to me to be handled by ON_ERROR_ROLLBACK.
 What exactly is lacking in that feature?

I think the problem is with doing something like this:

BEGIN;
INSERT INTO foo VALUES ('1'); UPDATE status SET updated=now() WHERE tab='foo';
INSERT INTO bar VALUES ('2'); UPDATE status SET updated=now() WHERE tab='bar';
INSERT INTO baz VALUES ('3'); UPDATE status SET updated=now() WHERE tab='baz';
COMMIT;

This will issue three savepoints (if I understand how things wok correctly),


yes


one for each INSERT+UPDATE block.  This way eiher both of them succeed
or fail, within one transaction.


i think so...Martijn said it best: you can 'rollback' to, but you
can't 'commit' to.  The 'commit to' would be the arguably much more
useful way of disposing of a savepoint.  But that should be taken up
with sql standards committee :(.


One solution would be a psql command which
would fire given command on error condition, like:


yes, psql can handle this. while (IMO) a hack, it addresses the
shortcoming (scripting) specifically not handled by savepoints..

merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Alvaro Herrera
Bruce Momjian wrote:
 
 Yep, agreed on the random I/O issue.  The larger question is if you have
 a huge table, do you care to reclaim 3% of the table size, rather than
 just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
 to take a lot of time, but vacuuming to relaim 3% three times seems like
 it is going to be more expensive than just vacuuming the 10% once.  And
 vacuuming to reclaim 1% ten times seems even more expensive.  The
 partial vacuum idea is starting to look like a loser to me again.

But if the partial vacuum is able to clean the busiest pages and reclaim
useful space, currently-running transactions will be able to use that
space and thus not have to extend the table.  Not that extension is a
problem on itself, but it'll keep your working set smaller.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Alvaro Herrera
Merlin Moncure wrote:
 On 1/22/07, Dawid Kuroczko [EMAIL PROTECTED] wrote:

 one for each INSERT+UPDATE block.  This way eiher both of them succeed
 or fail, within one transaction.
 
 i think so...Martijn said it best: you can 'rollback' to, but you
 can't 'commit' to.  The 'commit to' would be the arguably much more
 useful way of disposing of a savepoint.  But that should be taken up
 with sql standards committee :(.

You can RELEASE a savepoint though.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Richard Huxton

Bruce Momjian wrote:

Yep, agreed on the random I/O issue.  The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once.  And
vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.


Buying a house with a 25-year mortgage is much more expensive than just 
paying cash too, but you don't always have a choice.


Surely the key benefit of the partial vacuuming thing is that you can at 
least do something useful with a large table if a full vacuum takes 24 
hours and you only have 4 hours of idle I/O.


It's also occurred to me that all the discussion of scheduling way back 
when isn't directly addressing the issue. What most people want (I'm 
guessing) is to vacuum *when the user-workload allows* and the 
time-tabling is just a sysadmin first-approximation at that.


With partial vacuuming possible, we can arrange things with just three 
thresholds and two measurements:

  Measurement 1 = system workload
  Measurement 2 = a per-table requires vacuuming value
  Threshold 1 = workload at which we do more vacuuming
  Threshold 2 = workload at which we do less vacuuming
  Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to the 
thresholds and starts a new vacuum, kills one or does nothing. New 
vacuum processes keep getting started as long as there is workload spare 
and tables that need vacuuming.


Now the trick of course is how you measure system workload in a 
meaningful manner.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Gregory Stark wrote:

Bruce Momjian [EMAIL PROTECTED] writes:


I agree it index cleanup isn't  50% of vacuum.  I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.


Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.


Note that with uniformly random updates, you have dirtied every page of 
the table until you get anywhere near 5% of dead space. So we have to 
assume non-uniform distribution of update for the DSM to be of any help.


And if we assume non-uniform distribution, it's a good bet that the 
blocks that need vacuuming are also not randomly distributed. In fact, 
they might very well all be in one cluster, so that scanning that 
cluster is indeed sequential I/O.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/22/07, Alvaro Herrera [EMAIL PROTECTED] wrote:

 i think so...Martijn said it best: you can 'rollback' to, but you
 can't 'commit' to.  The 'commit to' would be the arguably much more
 useful way of disposing of a savepoint.  But that should be taken up
 with sql standards committee :(.

You can RELEASE a savepoint though.


not following an error.  RELEASE serves absolutely no purpose
whatsoever. it's like the sql equivalent of an assembly NOP...wasts
cpu cycles for no reason.

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] autovacuum process handling

2007-01-22 Thread Alvaro Herrera
Hi,

This is how I think autovacuum should change with an eye towards being
able to run multiple vacuums simultaneously:

1. There will be two kinds of processes, autovacuum launcher and
autovacuum worker.

2. The launcher will be in charge of scheduling and will tell workers
what to do

3. The workers will be similar to what autovacuum does today: start when
somebody else tells it to start, process a single item (be it a table or
a database) and terminate

4. Launcher will be a continuously-running process, akin to bgwriter;
connected to shared memory

5. Workers will be direct postmaster children; so postmaster will get
SIGCHLD when worker dies

6. Launcher will start a worker using the following protocol:
   - Set up information on what to run on shared memory
   - invoke SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER)
   - Postmaster will react by starting a worker, and registering it very
 similarly to a regular backend, so it can be shut down easily when
 appropriate.
 (Thus launcher will not be informed right away when worker dies)
   - Worker will examine shared memory to know what to do, clear the
 request, and send a signal to Launcher
   - Launcher wakes up and can start another one if appropriate

Does this raise some red flags?  It seems straightforward enough to me;
I'll submit a patch implementing this, so that scheduling will continue
to be as it is today.  Thus the scheduling discussions are being
deferred until they can be actually useful and implementable.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Kenneth Marshall wrote:

On Mon, Jan 22, 2007 at 06:42:09PM +, Simon Riggs wrote:

Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...


There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.


Yeah, I had the same idea when we discussed synchronizing sequential 
scans. The biggest difference is that with queries, there's often a user 
waiting for the query to finish, but with vacuum we don't care so much 
how long it takes.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian
Alvaro Herrera wrote:
 Bruce Momjian wrote:
  
  Yep, agreed on the random I/O issue.  The larger question is if you have
  a huge table, do you care to reclaim 3% of the table size, rather than
  just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
  to take a lot of time, but vacuuming to relaim 3% three times seems like
  it is going to be more expensive than just vacuuming the 10% once.  And
  vacuuming to reclaim 1% ten times seems even more expensive.  The
  partial vacuum idea is starting to look like a loser to me again.
 
 But if the partial vacuum is able to clean the busiest pages and reclaim
 useful space, currently-running transactions will be able to use that
 space and thus not have to extend the table.  Not that extension is a
 problem on itself, but it'll keep your working set smaller.

Yes, but my point is that if you are trying to avoid vacuuming the
table, I am afraid the full index scan is going to be painful too.  I
can see corner cases where partial vacuum is a win (I only have 4 hours
of idle I/O), but for the general case I am still worried that partial
vacuum will not be that useful as long as we have to scan the indexes.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Ron Mayer
Gregory Stark wrote:
 
 Actually no. A while back I did experiments to see how fast reading a file
 sequentially was compared to reading the same file sequentially but skipping
 x% of the blocks randomly. The results were surprising (to me) and depressing.
 The breakeven point was about 7%. [...]
 
 The theory online was that as long as you're reading one page from each disk
 track you're going to pay the same seek overhead as reading the entire track.

Could one take advantage of this observation in designing the DSM?

Instead of a separate bit representing every page, having each bit
represent 20 or so pages might be a more useful unit.  It sounds
like the time spent reading would be similar; while the bitmap
would be significantly smaller.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Joris Dobbelsteen
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Gregory Stark
Sent: maandag 22 januari 2007 19:41
To: Bruce Momjian
Cc: Heikki Linnakangas; Russell Smith; Darcy Buskermolen; 
Simon Riggs; Alvaro Herrera; Matthew T. O'Connor; Pavan 
Deolasee; Christopher Browne; pgsql-general@postgresql.org; 
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [GENERAL] Autovacuum Improvements

Bruce Momjian [EMAIL PROTECTED] writes:

 Yep, agreed on the random I/O issue.  The larger question is if you 
 have a huge table, do you care to reclaim 3% of the table 
size, rather 
 than just vacuum it when it gets to 10% dirty?  I realize the vacuum 
 is going to take a lot of time, but vacuuming to relaim 3% 
three times 
 seems like it is going to be more expensive than just vacuuming the 
 10% once.  And vacuuming to reclaim 1% ten times seems even more 
 expensive.  The partial vacuum idea is starting to look like 
a loser to me again.

Well the answer is of course that depends.

If you maintain the dead space at a steady state averaging 
1.5% instead of 5% your table is 3.33% smaller on average. If 
this is a DSS system that will translate into running your 
queries 3.33% faster. It will take a lot of vacuums before 
they hurt more than a 3%+ performance drop.

Good, this means a DSS system will mostly do table scans (right?). So
probably you should witness the 'table scan' statistic and rows fetched
aproaching the end of the universe (at least compared to
inserts/updates/deletes)?

If it's an OLTP system the it's harder to figure. a 3.33% 
increase in data density will translate to a higher cache hit 
rate but how much higher depends on a lot of factors. In our 
experiments we actually got bigger boost in these kinds of 
situations than the I expected (I expected comparable to the 
3.33% improvement). So it could be even more than 3.33%. But 
like said it depends.
If you already have the whole database cache you won't see any 
improvement. If you are right on the cusp you could see a huge benefit.

These tables have high insert, update and delete rates, probably a lot
of index scans? I believe the workload on table scans should be (close
to) none.

Are you willing to share some of this measured data? I'm quite
interested in such figures.

It sounds like you're underestimating the performance drain 
10% wasted space has. If we found out that one routine was 
unnecessarily taking 10% of the cpu time it would be an 
obvious focus of attention. 10% wasted space is going to work 
out to about 10% of the i/o time.

It also sounds like we're still focused on the performance 
impact in absolute terms. I'm much more interested in changing 
the performance characteristics so they're predictable and 
scalable. It doesn't matter much if your 1kb table is 100% 
slower than necessary but it does matter if your 1TB table 
needs 1,000x as much vacuuming as your 1GB table even if it's 
getting the same update traffic.

Or rather, the vacuuming should pay back.
A nice metric might be: cost_of_not_vacuuming / cost_of_vacuuming.
Obviously, the higher the better.

- Joris Dobbelsteen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

This is how I think autovacuum should change with an eye towards being
able to run multiple vacuums simultaneously:


[snip details]


Does this raise some red flags?  It seems straightforward enough to me;
I'll submit a patch implementing this, so that scheduling will continue
to be as it is today.  Thus the scheduling discussions are being
deferred until they can be actually useful and implementable.


I can't really speak to the PostgreSQL signaling innards, but this sound 
logical to me.  I think having the worker processes be children of the 
postmaster and having them be single-minded (or single-tasked) also 
makes a lot of sense.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Joshua D. Drake
Or so... :)

Thought I would do a poll of what is happening in the world for 8.3. I have:

Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
Gavin Sherry: Bitmap Indexes (on disk), possible basic Window functions
Jonah Harris: WITH/Recursive Queries?
Andrei Kovalesvki: Some Win32 work with Magnus
Magnus Hagander: VC++ support (thank goodness)
Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
Oleg Bartunov: Tsearch2 in core
Neil Conway: Patch Review (including enums), pg_fcache

Vertical projects:

Pavel Stehule: PLpsm
Alexey Klyukin: PLphp
Andrei Kovalesvki: ODBCng

I am sure there are more, the ones with question marks are unknowns but
heard of in the ether somewhere. Any additions or confirmations?

Sincerely,

Joshua D. Drake



-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Jonah H. Harris

On 1/22/07, Joshua D. Drake [EMAIL PROTECTED] wrote:

Jonah Harris: WITH/Recursive Queries?


Yup, just talked with Bruce about this last week.  Working on the design now.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Kenneth Marshall
On Mon, Jan 22, 2007 at 06:42:09PM +, Simon Riggs wrote:
 On Mon, 2007-01-22 at 13:27 -0500, Bruce Momjian wrote:
  Yep, agreed on the random I/O issue.  The larger question is if you have
  a huge table, do you care to reclaim 3% of the table size, rather than
  just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
  to take a lot of time, but vacuuming to relaim 3% three times seems like
  it is going to be more expensive than just vacuuming the 10% once.  And
  vacuuming to reclaim 1% ten times seems even more expensive.  The
  partial vacuum idea is starting to look like a loser to me again.
 
 Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
 
 -- 
   Simon Riggs 
   EnterpriseDB   http://www.enterprisedb.com
 

There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.

Ken


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-22 Thread Simon Riggs
On Mon, 2007-01-22 at 13:41 +, Heikki Linnakangas wrote:
 Any thoughts before I start experimenting?

Probably only to detail the various use cases we are discussing.

My thoughts on various use cases are:

- small table with frequent update/delete, heap and indexes all/mostly
cached
e.g. Counter tables, DBT2: District/Warehouse TPC-C, pgbench:
Branches/Tellers
Current VACUUM works well in this situation, since the only I/O incurred
is the WAL written for the VACUUM. VACUUM very cheap even if not in
cache because of sequential I/O. Keeping track of whether there are hot
spots in these tables seems like a waste of cycles and could potentially
introduce contention and hence reduce performance. These need to be very
frequently VACUUMed, even when other VACUUMs are required.
My current view: just need multiple concurrent autovacuum processes.

- large table with severe hotspots
e.g. DBT2: NewOrder, larger queue-style tables
The hotspots are likely to be in cache and the not-so-hotspots might or
might not be in cache, but we don't care either way. DSM concept works
well for this case, since we are able to avoid lots of I/O by
appropriate book-keeping. Works well for removing rows after a file-scan
DELETE, as well as for DELETE or UPDATE hot spots. 
My current view: DSM would be great for this

- large table with few hotspots
e.g. DBT2: Stock, pgbench: Accounts, most Customer tables
Current VACUUM works very badly in this case, since updates are sparsely
distributed across table. DSM wouldn't help either unless we
differentiate between few/many updates to a block. 
My current view: Piggyback concept seems on the right track, but clearly
needs further thought.

Currently we have only one technique for garbage collection, plus one
process to perform it. We need multiple techniques executed by multiple
processes, when required, plus some way of automatically selecting which
is appropriate depending upon the use case. Yes, automatic :-)

DSM and this piggyback idea need not be thought of as competing
techniques.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Kenneth Marshall
On Mon, Jan 22, 2007 at 07:24:20PM +, Heikki Linnakangas wrote:
 Kenneth Marshall wrote:
 On Mon, Jan 22, 2007 at 06:42:09PM +, Simon Riggs wrote:
 Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
 
 There may be other functions that could leverage a similar sort of
 infrastructure. For example, a long DB mining query could be registered
 with the system. Then as the pieces of the table/database are brought in
 to shared memory during the normal daily DB activity they can be acquired
 without forcing the DB to run a very I/O expensive query when waiting a
 bit for the results would be acceptable. As long as we are thinking
 piggyback.
 
 Yeah, I had the same idea when we discussed synchronizing sequential 
 scans. The biggest difference is that with queries, there's often a user 
 waiting for the query to finish, but with vacuum we don't care so much 
 how long it takes.
 
Yes, but with trending and statistical analysis you may not need the
exact answer ASAP. An approximate answer based on a fraction of the
information would be useful. Also, what if queries could be run without
impacting the production uses of a database. One might imagine having a
query with results that converge as the table is processed during normal
use.

Ken

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [BUGS] BUG #2907: pg_get_serial_sequence quoting

2007-01-22 Thread Adriaan van Os

Tom Lane wrote:

Bruce Momjian [EMAIL PROTECTED] writes:

I presume the reason for that is that the first paramater can be
qualified:
select pg_get_serial_sequence('public.FOO', 'Ff1');



Would someone explain why qualification makes us lowercase the first
parameter by default?  I don't understand it well enough to document it.


The point is that we have to parse the first parameter, whereas the
second one can be taken literally.


It still looks inconsistent and ugly. I think the design mistake of pg_get_serial_sequence is that 
it takes two parameters rather than one (a fully qualified doublequoted columnname path) or three 
(optionally empty schema, tablename, columnname, all three literal).


Regards,

Adriaan van Os

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] send password to pg_dump

2007-01-22 Thread shaunc

Hello,

I'm trying to run pg_dump programmatically, and I'm wondering how I can send
it a password.

I'm running it with a system() call in a child process, and sending the
password in from the parent process on stdin, but somehow pg_dump always
finds out how to ask my terminal for a password. How do I fool it?

(I need this to work in linux... in python would be preferable too.)

Thanks for any advice.

- Shaun
-- 
View this message in context: 
http://www.nabble.com/send-password-to-pg_dump-tf3048210.html#a8473174
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Oleg Bartunov

On Mon, 22 Jan 2007, Joshua D. Drake wrote:


Or so... :)

Thought I would do a poll of what is happening in the world for 8.3. I have:

Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
Gavin Sherry: Bitmap Indexes (on disk), possible basic Window functions
Jonah Harris: WITH/Recursive Queries?
Andrei Kovalesvki: Some Win32 work with Magnus
Magnus Hagander: VC++ support (thank goodness)
Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
Oleg Bartunov: Tsearch2 in core


Teodor Sigaev should be here !


Neil Conway: Patch Review (including enums), pg_fcache

Vertical projects:

Pavel Stehule: PLpsm
Alexey Klyukin: PLphp
Andrei Kovalesvki: ODBCng

I am sure there are more, the ones with question marks are unknowns but
heard of in the ether somewhere. Any additions or confirmations?

Sincerely,

Joshua D. Drake






Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] send password to pg_dump

2007-01-22 Thread Andrew Dunstan

shaunc wrote:

Hello,

I'm trying to run pg_dump programmatically, and I'm wondering how I can send
it a password.

I'm running it with a system() call in a child process, and sending the
password in from the parent process on stdin, but somehow pg_dump always
finds out how to ask my terminal for a password. How do I fool it?

(I need this to work in linux... in python would be preferable too.)


  


1. this is the wrong forum for this question - here we discuss 
postgresql developments, not usage
2. use a pgpass file, possibly with a PGPASSFILE environment setting as 
well - see the docs for details. This works on all platforms and should 
work with anything at all calling pg_dump.
3. In general, the only way to fool programs properly this way is to 
use expect or some equivalent that uses ptys. In this case, it would be 
overkill.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-22 Thread Jim C. Nasby
On Mon, Jan 22, 2007 at 02:51:47PM +, Heikki Linnakangas wrote:
 I've been looking at the way we do vacuums.
 
 The fundamental performance issue is that a vacuum generates 
 nheapblocks+nindexblocks+ndirtyblocks I/Os. Vacuum cost delay helps to 
 spread the cost like part payment, but the total is the same. In an I/O 
 bound system, the extra I/O directly leads to less throughput.
 
 Therefore, we need to do less I/O. Dead space map helps by allowing us 
 to skip blocks that don't need vacuuming, reducing the # of I/Os to 
 2*ndirtyblocks+nindexblocks. That's great, but it doesn't help us if the 
 dead tuples are spread uniformly.
 
 If we could piggyback the vacuum I/Os to the I/Os that we're doing 
 anyway, vacuum wouldn't ideally have to issue any I/O of its own. I've 
 tried to figure out a way to do that.
 
 Vacuum is done in 3 phases:
 
 1. Scan heap
 2. Vacuum index
 3. Vacuum heap
 


 Instead of doing a sequential scan, we could perform the 1st phase by 
 watching the buffer pool, scanning blocks for dead tuples when they're 
 in memory and keeping track of which pages we've seen. When all pages 
 have been seen, the tid list is sorted and 1st phase is done.
 
 In theory, the index vacuum could also be done that way, but let's 
 assume for now that indexes would be scanned like they are currently.
 
 The 3rd phase can be performed similarly to the 1st phase. Whenever a 
 page enters the buffer pool, we check the tid list and remove any 
 matching tuples from the page. When the list is empty, vacuum is complete.

Is there any real reason to demark the start and end of a vacuum? Why
not just go to a continuous process? One possibility is to keep a list
of TIDs for each phase, though that could prove tricky with multiple
indexes.

 A variation of the scheme would be to keep scanning pages that are in 
 cache, until the tid list reaches a predefined size, instead of keeping 
 track of which pages have already been seen. That would deal better with 
 tables with hot and cold spots, but it couldn't advance the relfrozenid 
 because there would be no guarantee that all pages are visited. Also, we 
 could start 1st phase of the next vacuum, while we're still in the 3rd 
 phase of previous one.
 
What if we tracked freeze status on a per-page basis? Perhaps track the
minimum XID that's on each page. That would allow us to ensure that we
freeze pages that are approaching XID wrap.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Jim C. Nasby
On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
 Gregory Stark wrote:
  
  Actually no. A while back I did experiments to see how fast reading a file
  sequentially was compared to reading the same file sequentially but skipping
  x% of the blocks randomly. The results were surprising (to me) and 
  depressing.
  The breakeven point was about 7%. [...]
  
  The theory online was that as long as you're reading one page from each disk
  track you're going to pay the same seek overhead as reading the entire 
  track.
 
 Could one take advantage of this observation in designing the DSM?
 
 Instead of a separate bit representing every page, having each bit
 represent 20 or so pages might be a more useful unit.  It sounds
 like the time spent reading would be similar; while the bitmap
 would be significantly smaller.

If we extended relations by more than one page at a time we'd probably
have a better shot at the blocks on disk being contiguous and all read
at the same time by the OS.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Jim C. Nasby
On Mon, Jan 22, 2007 at 03:40:17PM +, Simon Riggs wrote:
 This would add a table-specific ANALYZE statement following each table's
 actions.

It'd probably be best to put it before any index creating activities,
since there's a better chance of everything from the table being in
shared buffers.

Better yet would be if COPY could analyze data as it was loaded in...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [pgsql-patches] [HACKERS] Win32 WEXITSTATUS too

2007-01-22 Thread Bruce Momjian
bruce wrote:
 Tom Lane wrote:
  Alvaro Herrera [EMAIL PROTECTED] writes:
   Bruce Momjian wrote:
   OK, maybe /doc or src/tools.  A more radical approach would be to put
   the list in our documentation, or have initdb install it.
  
   Why not put it in techdocs or some such?
  
  I think we've learned by now that putting copies of other peoples' code
  in our tree isn't such a hot idea; what is going to cause it to be
  updated when things change?  How do you know the values are even the
  same across all the Windows versions we support?
  
  Basically this whole idea is misconceived.  Just print the number and
  have done.
 
 And how do people interpret that number?

Ah, I found something:

http://support.microsoft.com/kb/259693

Someone on IRC says that is kernel mode only, and is looking for a
user-mode version, so we would be able to print out a meaningful message
rather than a hex value that has to be looked up.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Jim C. Nasby
On Mon, Jan 22, 2007 at 04:24:28PM -0300, Alvaro Herrera wrote:
 4. Launcher will be a continuously-running process, akin to bgwriter;
 connected to shared memory
 
So would it use up a database connection?

 5. Workers will be direct postmaster children; so postmaster will get
 SIGCHLD when worker dies

As part of this I think we need to make it more obvious how all of this
ties into max_connections. Currently, autovac ties up one of the
super-user connections whenever it's not asleep; these changes would
presumably mean that more of those connections could be tied up.

Rather than forcing users to worry about adjusting max_connections and
superuser_reserved_connections to accommodate autovacuum, the system
should handle it for them.

Were you planning on limiting the number of concurrent vacuum processes
that could be running? If so, we could probably just increase superuser
connections by that amount. If not, we might need to think of something
else...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Steve Atkins


On Jan 22, 2007, at 11:16 AM, Richard Huxton wrote:


Bruce Momjian wrote:
Yep, agreed on the random I/O issue.  The larger question is if  
you have
a huge table, do you care to reclaim 3% of the table size, rather  
than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is  
going
to take a lot of time, but vacuuming to relaim 3% three times  
seems like
it is going to be more expensive than just vacuuming the 10%  
once.  And

vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.


Buying a house with a 25-year mortgage is much more expensive than  
just paying cash too, but you don't always have a choice.


Surely the key benefit of the partial vacuuming thing is that you  
can at least do something useful with a large table if a full  
vacuum takes 24 hours and you only have 4 hours of idle I/O.


It's also occurred to me that all the discussion of scheduling way  
back when isn't directly addressing the issue. What most people  
want (I'm guessing) is to vacuum *when the user-workload allows*  
and the time-tabling is just a sysadmin first-approximation at that.


Yup. I'd really like for my app to be able to say Hmm. No  
interactive users at the moment, no critical background tasks. Now  
would be a really good time for the DB to do some maintenance. but  
also to be able to interrupt the maintenance process if some new  
users or other system load show up.


With partial vacuuming possible, we can arrange things with just  
three thresholds and two measurements:

  Measurement 1 = system workload
  Measurement 2 = a per-table requires vacuuming value
  Threshold 1 = workload at which we do more vacuuming
  Threshold 2 = workload at which we do less vacuuming
  Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to  
the thresholds and starts a new vacuum, kills one or does nothing.  
New vacuum processes keep getting started as long as there is  
workload spare and tables that need vacuuming.


Now the trick of course is how you measure system workload in a  
meaningful manner.


I'd settle for a start maintenance, stop maintenance API.  
Anything else (for instance the heuristics you suggest above) would  
definitely be gravy.


It's not going to be simple to do, though, I don't think.

Cheers,
  Steve


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread korryd


 Thought I would do a poll of what is happening in the world for 8.3. I have:
 
 Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
 Gavin Sherry: Bitmap Indexes (on disk), possible basic Window functions
 Jonah Harris: WITH/Recursive Queries?
 Andrei Kovalesvki: Some Win32 work with Magnus
 Magnus Hagander: VC++ support (thank goodness)
 Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
 Oleg Bartunov: Tsearch2 in core
 Neil Conway: Patch Review (including enums), pg_fcache



Korry Douglas: PL/pgSQL debugger (and probably a PL/pgSQL execution profiler as 
well)




--
  Korry Douglas[EMAIL PROTECTED]
  EnterpriseDB  http://www.enterprisedb.com


Re: [HACKERS] -f output file option for pg_dumpall

2007-01-22 Thread elein
On Mon, Jan 15, 2007 at 10:13:16AM -0500, Andrew Dunstan wrote:
 
 
 Neil Conway wrote:
 On Thu, 2007-01-11 at 14:36 -0500, Neil Conway wrote:
   
 I don't think they need to be integrated any time soon, but if we were
 to design pg_dump and pg_dumpall from scratch, it seems more logical to
 use a single program
 
 
 On thinking about this some more, it might be useful to factor much of
 pg_dump's logic for reconstructing the state of a database into a shared
 library. This would make it relatively easy for developers to plug new
 archive formats into the library (in addition to the present 3 archive
 formats), or to make use of this functionality in other applications
 that want to reconstruct the logical state of a database from the
 content of the system catalogs. We could then provide a client app
 implemented on top of the library that would provide similar
 functionality to pg_dump.
 
 Moving pg_dump's functionality into the backend has been suggested in
 the past (and rejected for good reason), but I think this might be a
 more practical method for making the pg_dump logic more easily reusable.
 
 
   
 
 I like this idea. For example, we might usefully map some of this to 
 psql \ commands, without having to replicate the underlying logic.

Don't we already do this with the .psqlrc file?

--elein


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] -f output file option for pg_dumpall

2007-01-22 Thread Andrew Dunstan
elein wrote:
 On Mon, Jan 15, 2007 at 10:13:16AM -0500, Andrew Dunstan wrote:


 Neil Conway wrote:
 On Thu, 2007-01-11 at 14:36 -0500, Neil Conway wrote:
 
 I don't think they need to be integrated any time soon, but if we were
 to design pg_dump and pg_dumpall from scratch, it seems more logical
 to
 use a single program
 
 
 On thinking about this some more, it might be useful to factor much of
 pg_dump's logic for reconstructing the state of a database into a
 shared
 library. This would make it relatively easy for developers to plug new
 archive formats into the library (in addition to the present 3 archive
 formats), or to make use of this functionality in other applications
 that want to reconstruct the logical state of a database from the
 content of the system catalogs. We could then provide a client app
 implemented on top of the library that would provide similar
 functionality to pg_dump.
 
 Moving pg_dump's functionality into the backend has been suggested in
 the past (and rejected for good reason), but I think this might be a
 more practical method for making the pg_dump logic more easily
 reusable.
 
 
 

 I like this idea. For example, we might usefully map some of this to
 psql \ commands, without having to replicate the underlying logic.

 Don't we already do this with the .psqlrc file?


No. \ commands are implemented in C code.

cheers

andrew


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Strange file in snapshot tarball

2007-01-22 Thread ITAGAKI Takahiro

Michael Meskes [EMAIL PROTECTED] wrote:

  There is a file 'compat_informix-dec_test-OpenBSD3.8.broken.stdout'
  under the path 'src/interfaces/ecpg/test/extedted/' in the recent
  ftp snapshot (postgresql-snapshot.tar.gz).
 
 Would you please care to explain? I do not understand what you are
 saying. 

Oops, sorry. It's my extractor's failure.
The file is the only file that name is longer than 100 characters
in the tarball. My extractor does not support long name files.

I'm sorry for all the fuss.

---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Would this SPI function be useful?

2007-01-22 Thread Jeff Davis

Would it be useful to have an SPI function that returns the OID and
namespace of the function being executed?

The reason I bring this up is due to a discussion on the PostGIS lists
about making the installation able to work in any namespace from one
generic SQL script.

The problem they have is that, because the functions don't know what
schema they reside in, they don't know how to call out to other
functions in the same namespace.

It might be useful to have a few basic functions that allow developers
of add-ons to access information like that. What come to my mind are:

SPI_get_my_oid();
SPI_get_my_nspname();

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Jeff Davis
On Mon, 2007-01-22 at 14:16 -0800, Joshua D. Drake wrote:
 I am sure there are more, the ones with question marks are unknowns but
 heard of in the ether somewhere. Any additions or confirmations?
 

I'd still like to make an attempt at my Synchronized Scanning patch.

If freeze is 10 weeks away, I better get some more test results posted
soon, however. 

Regards,
Jeff Davis


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread ITAGAKI Takahiro

Joshua D. Drake [EMAIL PROTECTED] wrote:

 Thought I would do a poll of what is happening in the world for 8.3. I have:
 
 Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
 Gavin Sherry: Bitmap Indexes (on disk), possible basic Window functions
 Jonah Harris: WITH/Recursive Queries?
 Andrei Kovalesvki: Some Win32 work with Magnus
 Magnus Hagander: VC++ support (thank goodness)
 Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
 Oleg Bartunov: Tsearch2 in core
 Neil Conway: Patch Review (including enums), pg_fcache

I'm working on Dead Space Map and Load-distribution of checkpoints.
I will make it do by 8.3.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Stephen Frost
* Joshua D. Drake ([EMAIL PROTECTED]) wrote:
 Thought I would do a poll of what is happening in the world for 8.3. I have:

It seems unlikely that I'm going to have time at the rate things are
going but I was hoping to take a whack at default permissions/ownership
by schema.  Kind of a umask-type thing but for schemas instead of roles
(though I've thought about it per role and that might also solve the
particular problem we're having atm).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Stephen Frost
* Joshua D. Drake ([EMAIL PROTECTED]) wrote:
 Thought I would do a poll of what is happening in the world for 8.3. I have:

Another thing which was mentioned previously which I'd really like to
see happen (and was discussed on the list...) is replacing the Kerberos
support with GSSAPI support and adding support for SSPI.  Don't recall
who had said they were looking into working on it though..

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Default permissisons from schemas

2007-01-22 Thread Stephen Frost
Greetings,

* Stephen Frost ([EMAIL PROTECTED]) wrote:
 It seems unlikely that I'm going to have time at the rate things are
 going but I was hoping to take a whack at default permissions/ownership
 by schema.  Kind of a umask-type thing but for schemas instead of roles
 (though I've thought about it per role and that might also solve the
 particular problem we're having atm).

  Following up on my reply to Joshua, what I'd like to propose is, for
  comments and suggestions:

  ALTER SCHEMA name [ [ WITH ] [ DEFAULT ] option [ ... ] ]

  where option can be:

{ GRANT { { SELECT | INSERT | UPDATE | DELETE | RULE | REFERENCES | TRIGGER 
| EXECUTE } 
  [,...] | ALL [ PRIVILEGES ] } 
  TO { role | PUBLIC } [, ...] [ WITH GRANT OPTION ] 
} [, ...]

OWNER role

  pg_namespace would be modified to have two new columns, 
  nspdefowner oid, and nspdefacl aclitem[].  When NULL these would have
  no effect.  When not-null the 'nspdefowner' would be the owner of all
  objects created in the schema.  When not-null the 'nspdefacl' would be
  the initial acl for the object (modified for what grants are valid for
  the specific type of object).  These can only be changed by the schema
  owner and the 'OWNER role' must have create permissions in the schema.
  Ideally this would be checked when the ALTER SCHEMA is issued and then
  a dependency created for that.  If that's not possible today then the
  rights check would be done when an object creation is attempted,
  possibly with a fall-back to check the current user's rights.

  The defaults would be NULL for these so there would be no change in
  behaviour unless specifically asked for.

  I believe this would cover the following to-do item:
  Allow GRANT/REVOKE permissions to be inherited by objects based on
  schema permissions

  Comments?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-22 Thread ITAGAKI Takahiro

Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Vacuum is done in 3 phases:
 1. Scan heap
 2. Vacuum index
 3. Vacuum heap

 A variation of the scheme would be to keep scanning pages that are in 
 cache, until the tid list reaches a predefined size, instead of keeping 
 track of which pages have already been seen. That would deal better with 
 tables with hot and cold spots, but it couldn't advance the relfrozenid 
 because there would be no guarantee that all pages are visited. Also, we 
 could start 1st phase of the next vacuum, while we're still in the 3rd 
 phase of previous one.

ISTM, it is another DSM that has a tuple-level accuracy, not a page-level.
One of the benefits is that we can skip the 1st phase of vacuum; We will
have a TID list of dead tuples at the start of vacuum, so we can start
from 2nd phase.

I have another idea for use of TID lists -- Store the TIDs after the 1st
or 2nd phase, and exit the vacuum. At the next vacuum, we will do both
the previous 3rd phase and new 1st phase at once, so that I/Os are reduced
(ndirtyblocks + nindexblocks) from (2*ndirtyblocks + nindexblocks) in
average. We've already use a similar method in vacuuming btree indexes
to collect recyclable empty pages.

I think piggybacking of I/Os are very useful. Buffer manager helps us
folding up some of I/Os, but explicit orders are more effective.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Updateable cursors

2007-01-22 Thread FAST PostgreSQL
We are trying to develop the updateable cursors functionality into 
Postgresql. I have given below details of the design and also issues we are 
facing.  Looking forward to the advice on how to proceed with these issues.

Rgds,
Arul Shaji




 
1. Introduction
--
This is a combined proposal and design document for adding updatable 
(insensitive) cursor capability to the PostgreSQL database. 
There have already been a couple of previous proposals since 2003 for 
implementing this feature so there appears to be community interest in doing 
so. This will enable the following constructs to be processed:


UPDATE table_name SET value_list WHERE CURRENT OF cursor_name
DELETE FROM table_name WHERE CURRENT OF cursor_name

This has the effect of users being able to update or delete specific rows of 
a table, as defined by the row currently fetched into the cursor.


2. Overall Conceptual Design
-
The design is considered from the viewpoint of progression of a command 
through the various stages of processing, from changes to the file ‘gram.y’ 
to implement the actual grammar changes, through to changes in the Executor 
portion of the database architecture.

2.1 Changes to the Grammar
--
The following changes will be done to the PostgreSQL grammar:

UPDATE statement has the option ‘WHERE CURRENT OF cursor_name’ added
DELETE statement has the option ‘WHERE CURRENT OF cursor_name’ added

The cursor_name data is held in the UpdateStmt and DeleteStmt structures and 
contains just the name of the cursor.

The pl/pgsql grammar changes in the same manner.

The word CURRENT will be added to the ScanKeywords array in keywords.c.


2.2 Changes to Affected Data Structures
--
The following data structures are affected by this change: 

Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt 
structures

The Portal will contain a list of structures of relation ids and tuple ids 
relating to the tuple held in the QueryDesc structure. There will be one 
entry in the relation and tuple id list for each entry in the relation-list 
of the statement below: 

DECLARE cursor_name [WITH HOLD] SELECT FOR UPDATE OF relation-list 

The QueryDesc structure will contain the relation id and the tuple id 
relating to the tuple obtained via the FETCH command so that it can be 
propagated back to the Portal for storage in the list described above.

The UpdateStmt and DeleteStmt structures have the cursor name added so that 
the information is available for use in obtaining the portal structure 
related to the cursor previously opened via the DECLARE CURSOR request.


2.3 Changes to the SQL Parser

At present, although the FOR UPDATE clause of the DECLARE CURSOR command has 
been present in the grammar, it causes an error message later in the 
processing since cursors are currently not updatable. This now needs to 
change. The ‘FOR UPDATE’ clause has to be valid, but not the ‘FOR SHARE’ 
clause. 

The relation names that follow the ‘FOR UPDATE’ clause will be added to the 
rtable in the Query structure and identified by means of the rowMarks array. 
In the case of an updatable cursor the FOR SHARE option is not allowed 
therefore all entries in the rtable that are identified by the rowMarks array 
must relate to tables that are FOR UPDATE.

In the UPDATE or DELETE statements the ‘WHERE CURRENT OF cursor_name’ 
clause results in the cursor name being placed in the UpdateStmt or 
DeleteStmt structure. During the processing of the functions - 
transformDeleteStmt() and transformUpdateStmt() - the cursor name is used to 
obtain a pointer to the related Portal structure and the tuple affected by 
the current UPDATE or DELETE statement is extracted from the Portal, where it 
has been placed as the result of a previous FETCH request. At this point all 
the information for the UPDATE or DELETE statement is available so the 
statements can be transformed into standard UPDATE or DELETE statements and 
sent for re-write/planning/execution as usual.

2.4 Changes to the Optimizer
--
There is a need to add a TidScan node to planning UPDATE / DELETE statements 
where the statements are ‘UPDATE / DELETE at position’. This is to enable the 
tuple ids of the tuples in the tables relating to the query to be obtained. 
There will need to be a new mechanism to achieve this, as at present, a Tid 
scan is done only if there is a standard WHERE condition on update or delete 
statements to provide Tid qualifier data.


2.5 Changes to the Executor
---
There are various options that have been considered for this part of the 
enhancement. These are described in the sections below.

We would like to hear opinions on which option is the best way to go or if 
none of these is acceptable, any alternate ideas ?

Option 1  MVCC Via Continuous Searching of Database

The 

Re: [HACKERS] Updateable cursors

2007-01-22 Thread Joshua D. Drake
FAST PostgreSQL wrote:
 We are trying to develop the updateable cursors functionality into 
 Postgresql. I have given below details of the design and also issues we are 
 facing.  Looking forward to the advice on how to proceed with these issues.
 
 Rgds,
 Arul Shaji

Would this be something that you would hope to submit for 8.3?

Joshua D. Drake


 
 
 
 
  
 1. Introduction
 --
 This is a combined proposal and design document for adding updatable 
 (insensitive) cursor capability to the PostgreSQL database. 
 There have already been a couple of previous proposals since 2003 for 
 implementing this feature so there appears to be community interest in doing 
 so. This will enable the following constructs to be processed:
 
 
 UPDATE table_name SET value_list WHERE CURRENT OF cursor_name
 DELETE FROM table_name WHERE CURRENT OF cursor_name
 
 This has the effect of users being able to update or delete specific rows of 
 a table, as defined by the row currently fetched into the cursor.
 
 
 2. Overall Conceptual Design
 -
 The design is considered from the viewpoint of progression of a command 
 through the various stages of processing, from changes to the file ‘gram.y’ 
 to implement the actual grammar changes, through to changes in the Executor 
 portion of the database architecture.
 
 2.1 Changes to the Grammar
 --
 The following changes will be done to the PostgreSQL grammar:
 
 UPDATE statement has the option ‘WHERE CURRENT OF cursor_name’ added
 DELETE statement has the option ‘WHERE CURRENT OF cursor_name’ added
 
 The cursor_name data is held in the UpdateStmt and DeleteStmt structures and 
 contains just the name of the cursor.
 
 The pl/pgsql grammar changes in the same manner.
 
 The word CURRENT will be added to the ScanKeywords array in keywords.c.
 
 
 2.2 Changes to Affected Data Structures
 --
 The following data structures are affected by this change: 
 
 Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt 
 structures
 
 The Portal will contain a list of structures of relation ids and tuple ids 
 relating to the tuple held in the QueryDesc structure. There will be one 
 entry in the relation and tuple id list for each entry in the relation-list 
 of the statement below: 
 
 DECLARE cursor_name [WITH HOLD] SELECT FOR UPDATE OF relation-list 
 
 The QueryDesc structure will contain the relation id and the tuple id 
 relating to the tuple obtained via the FETCH command so that it can be 
 propagated back to the Portal for storage in the list described above.
 
 The UpdateStmt and DeleteStmt structures have the cursor name added so that 
 the information is available for use in obtaining the portal structure 
 related to the cursor previously opened via the DECLARE CURSOR request.
 
 
 2.3 Changes to the SQL Parser
 
 At present, although the FOR UPDATE clause of the DECLARE CURSOR command has 
 been present in the grammar, it causes an error message later in the 
 processing since cursors are currently not updatable. This now needs to 
 change. The ‘FOR UPDATE’ clause has to be valid, but not the ‘FOR SHARE’ 
 clause. 
 
 The relation names that follow the ‘FOR UPDATE’ clause will be added to the 
 rtable in the Query structure and identified by means of the rowMarks array. 
 In the case of an updatable cursor the FOR SHARE option is not allowed 
 therefore all entries in the rtable that are identified by the rowMarks array 
 must relate to tables that are FOR UPDATE.
 
 In the UPDATE or DELETE statements the ‘WHERE CURRENT OF cursor_name’ 
 clause results in the cursor name being placed in the UpdateStmt or 
 DeleteStmt structure. During the processing of the functions - 
 transformDeleteStmt() and transformUpdateStmt() - the cursor name is used to 
 obtain a pointer to the related Portal structure and the tuple affected by 
 the current UPDATE or DELETE statement is extracted from the Portal, where it 
 has been placed as the result of a previous FETCH request. At this point all 
 the information for the UPDATE or DELETE statement is available so the 
 statements can be transformed into standard UPDATE or DELETE statements and 
 sent for re-write/planning/execution as usual.
 
 2.4 Changes to the Optimizer
 --
 There is a need to add a TidScan node to planning UPDATE / DELETE statements 
 where the statements are ‘UPDATE / DELETE at position’. This is to enable the 
 tuple ids of the tuples in the tables relating to the query to be obtained. 
 There will need to be a new mechanism to achieve this, as at present, a Tid 
 scan is done only if there is a standard WHERE condition on update or delete 
 statements to provide Tid qualifier data.
 
 
 2.5 Changes to the Executor
 ---
 There are various options that have been considered for this part of the 
 enhancement. These are 

Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Alvaro Herrera
Jim C. Nasby wrote:
 On Mon, Jan 22, 2007 at 04:24:28PM -0300, Alvaro Herrera wrote:
  4. Launcher will be a continuously-running process, akin to bgwriter;
  connected to shared memory
  
 So would it use up a database connection?

No.  It's connected to shared memory and has access to pgstats, but it's
not connected to any database so it's not counted.  You'd say it has the
same status as the bgwriter.

  5. Workers will be direct postmaster children; so postmaster will get
  SIGCHLD when worker dies
 
 As part of this I think we need to make it more obvious how all of this
 ties into max_connections. Currently, autovac ties up one of the
 super-user connections whenever it's not asleep; these changes would
 presumably mean that more of those connections could be tied up.

Sure.

 Rather than forcing users to worry about adjusting max_connections and
 superuser_reserved_connections to accommodate autovacuum, the system
 should handle it for them.
 
 Were you planning on limiting the number of concurrent vacuum processes
 that could be running? If so, we could probably just increase superuser
 connections by that amount. If not, we might need to think of something
 else...

The fact that I'm currently narrowly focused on process handling means
that I don't want to touch scheduling at all for now, so I'm gonna make
it so that the launcher decides to launch a worker run only when no
other worker is running.  Thus only a single vacuum thread at any
time.  In the meantime you're welcome to think on the possible solutions
to that problem, which we'll have to attack at some point in the
(hopefully) near future ;-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Updateable cursors

2007-01-22 Thread FAST PostgreSQL
On Tue, 23 Jan 2007 15:48, Joshua D. Drake wrote:
 FAST PostgreSQL wrote:
  We are trying to develop the updateable cursors functionality into
  Postgresql. I have given below details of the design and also issues we
  are facing.  Looking forward to the advice on how to proceed with these
  issues.
 
  Rgds,
  Arul Shaji

 Would this be something that you would hope to submit for 8.3?

Yes definitely. If we can finish it before the feature freeze of course.

Rgds,
Arul Shaji


 Joshua D. Drake

  1. Introduction
  --
  This is a combined proposal and design document for adding updatable
  (insensitive) cursor capability to the PostgreSQL database.
  There have already been a couple of previous proposals since 2003 for
  implementing this feature so there appears to be community interest in
  doing so. This will enable the following constructs to be processed:
 
 
  UPDATE table_name SET value_list WHERE CURRENT OF cursor_name
  DELETE FROM table_name WHERE CURRENT OF cursor_name
 
  This has the effect of users being able to update or delete specific rows
  of a table, as defined by the row currently fetched into the cursor.
 
 
  2. Overall Conceptual Design
  -
  The design is considered from the viewpoint of progression of a command
  through the various stages of processing, from changes to the file
  ?gram.y? to implement the actual grammar changes, through to changes in
  the Executor portion of the database architecture.
 
  2.1 Changes to the Grammar
  --
  The following changes will be done to the PostgreSQL grammar:
 
  UPDATE statement has the option ?WHERE CURRENT OF cursor_name? added
  DELETE statement has the option ?WHERE CURRENT OF cursor_name? added
 
  The cursor_name data is held in the UpdateStmt and DeleteStmt structures
  and contains just the name of the cursor.
 
  The pl/pgsql grammar changes in the same manner.
 
  The word CURRENT will be added to the ScanKeywords array in keywords.c.
 
 
  2.2 Changes to Affected Data Structures
  --
  The following data structures are affected by this change:
 
  Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt
  structures
 
  The Portal will contain a list of structures of relation ids and tuple
  ids relating to the tuple held in the QueryDesc structure. There will be
  one entry in the relation and tuple id list for each entry in the
  relation-list of the statement below:
 
  DECLARE cursor_name [WITH HOLD] SELECT FOR UPDATE OF relation-list
 
  The QueryDesc structure will contain the relation id and the tuple id
  relating to the tuple obtained via the FETCH command so that it can be
  propagated back to the Portal for storage in the list described above.
 
  The UpdateStmt and DeleteStmt structures have the cursor name added so
  that the information is available for use in obtaining the portal
  structure related to the cursor previously opened via the DECLARE CURSOR
  request.
 
 
  2.3 Changes to the SQL Parser
  
  At present, although the FOR UPDATE clause of the DECLARE CURSOR command
  has been present in the grammar, it causes an error message later in the
  processing since cursors are currently not updatable. This now needs to
  change. The ?FOR UPDATE? clause has to be valid, but not the ?FOR SHARE?
  clause.
 
  The relation names that follow the ?FOR UPDATE? clause will be added to
  the rtable in the Query structure and identified by means of the rowMarks
  array. In the case of an updatable cursor the FOR SHARE option is not
  allowed therefore all entries in the rtable that are identified by the
  rowMarks array must relate to tables that are FOR UPDATE.
 
  In the UPDATE or DELETE statements the ?WHERE CURRENT OF cursor_name?
  clause results in the cursor name being placed in the UpdateStmt or
  DeleteStmt structure. During the processing of the functions -
  transformDeleteStmt() and transformUpdateStmt() - the cursor name is used
  to obtain a pointer to the related Portal structure and the tuple
  affected by the current UPDATE or DELETE statement is extracted from the
  Portal, where it has been placed as the result of a previous FETCH
  request. At this point all the information for the UPDATE or DELETE
  statement is available so the statements can be transformed into standard
  UPDATE or DELETE statements and sent for re-write/planning/execution as
  usual.
 
  2.4 Changes to the Optimizer
  --
  There is a need to add a TidScan node to planning UPDATE / DELETE
  statements where the statements are ?UPDATE / DELETE at position?. This
  is to enable the tuple ids of the tuples in the tables relating to the
  query to be obtained. There will need to be a new mechanism to achieve
  this, as at present, a Tid scan is done only if there is a standard WHERE
  condition on update or delete statements to provide Tid qualifier 

Re: [HACKERS] Updateable cursors

2007-01-22 Thread Joshua D. Drake
FAST PostgreSQL wrote:
 On Tue, 23 Jan 2007 15:48, Joshua D. Drake wrote:
 FAST PostgreSQL wrote:
 We are trying to develop the updateable cursors functionality into
 Postgresql. I have given below details of the design and also issues we
 are facing.  Looking forward to the advice on how to proceed with these
 issues.

 Rgds,
 Arul Shaji
 Would this be something that you would hope to submit for 8.3?
 
 Yes definitely. If we can finish it before the feature freeze of course.

Great! I will put it on my, Remember to bug Arul list :)

Sincerely,

Joshua D. Drake

 
 Rgds,
 Arul Shaji
 
 
 Joshua D. Drake

 1. Introduction
 --
 This is a combined proposal and design document for adding updatable
 (insensitive) cursor capability to the PostgreSQL database.
 There have already been a couple of previous proposals since 2003 for
 implementing this feature so there appears to be community interest in
 doing so. This will enable the following constructs to be processed:


 UPDATE table_name SET value_list WHERE CURRENT OF cursor_name
 DELETE FROM table_name WHERE CURRENT OF cursor_name

 This has the effect of users being able to update or delete specific rows
 of a table, as defined by the row currently fetched into the cursor.


 2. Overall Conceptual Design
 -
 The design is considered from the viewpoint of progression of a command
 through the various stages of processing, from changes to the file
 ?gram.y? to implement the actual grammar changes, through to changes in
 the Executor portion of the database architecture.

 2.1 Changes to the Grammar
 --
 The following changes will be done to the PostgreSQL grammar:

 UPDATE statement has the option ?WHERE CURRENT OF cursor_name? added
 DELETE statement has the option ?WHERE CURRENT OF cursor_name? added

 The cursor_name data is held in the UpdateStmt and DeleteStmt structures
 and contains just the name of the cursor.

 The pl/pgsql grammar changes in the same manner.

 The word CURRENT will be added to the ScanKeywords array in keywords.c.


 2.2 Changes to Affected Data Structures
 --
 The following data structures are affected by this change:

 Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt
 structures

 The Portal will contain a list of structures of relation ids and tuple
 ids relating to the tuple held in the QueryDesc structure. There will be
 one entry in the relation and tuple id list for each entry in the
 relation-list of the statement below:

 DECLARE cursor_name [WITH HOLD] SELECT FOR UPDATE OF relation-list

 The QueryDesc structure will contain the relation id and the tuple id
 relating to the tuple obtained via the FETCH command so that it can be
 propagated back to the Portal for storage in the list described above.

 The UpdateStmt and DeleteStmt structures have the cursor name added so
 that the information is available for use in obtaining the portal
 structure related to the cursor previously opened via the DECLARE CURSOR
 request.


 2.3 Changes to the SQL Parser
 
 At present, although the FOR UPDATE clause of the DECLARE CURSOR command
 has been present in the grammar, it causes an error message later in the
 processing since cursors are currently not updatable. This now needs to
 change. The ?FOR UPDATE? clause has to be valid, but not the ?FOR SHARE?
 clause.

 The relation names that follow the ?FOR UPDATE? clause will be added to
 the rtable in the Query structure and identified by means of the rowMarks
 array. In the case of an updatable cursor the FOR SHARE option is not
 allowed therefore all entries in the rtable that are identified by the
 rowMarks array must relate to tables that are FOR UPDATE.

 In the UPDATE or DELETE statements the ?WHERE CURRENT OF cursor_name?
 clause results in the cursor name being placed in the UpdateStmt or
 DeleteStmt structure. During the processing of the functions -
 transformDeleteStmt() and transformUpdateStmt() - the cursor name is used
 to obtain a pointer to the related Portal structure and the tuple
 affected by the current UPDATE or DELETE statement is extracted from the
 Portal, where it has been placed as the result of a previous FETCH
 request. At this point all the information for the UPDATE or DELETE
 statement is available so the statements can be transformed into standard
 UPDATE or DELETE statements and sent for re-write/planning/execution as
 usual.

 2.4 Changes to the Optimizer
 --
 There is a need to add a TidScan node to planning UPDATE / DELETE
 statements where the statements are ?UPDATE / DELETE at position?. This
 is to enable the tuple ids of the tuples in the tables relating to the
 query to be obtained. There will need to be a new mechanism to achieve
 this, as at present, a Tid scan is done only if there is a standard WHERE
 condition on update or delete statements to provide 

Re: [HACKERS] Default permissisons from schemas

2007-01-22 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
   Following up on my reply to Joshua, what I'd like to propose is, for
   comments and suggestions:

   ALTER SCHEMA name [ [ WITH ] [ DEFAULT ] option [ ... ] ]

   where option can be:

 { GRANT { { SELECT | INSERT | UPDATE | DELETE | RULE | REFERENCES | TRI=
 GGER | EXECUTE }=20
 [,...] | ALL [ PRIVILEGES ] }=20
 TO { role | PUBLIC } [, ...] [ WITH GRANT OPTION ]=20
 } [, ...]

   OWNER role

This seems to ignore the problem that different types of objects have
different privileges.  E.g., if I want to grant USAGE on all sequences
that doesn't necessarily mean I want to grant USAGE on all languages.

   When not-null the 'nspdefowner' would be the owner of all
   objects created in the schema.

Whoa.  You are going to allow people to create objects owned by someone
else?  I don't think so ... most Unix systems have forbidden object
give-away for years, for very good reasons.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Updateable cursors

2007-01-22 Thread Lukas Kahwe Smith

Joshua D. Drake wrote:


Great! I will put it on my, Remember to bug Arul list :)


Hey Joshua,

could you put this stuff here:
http://developer.postgresql.org/index.php/Todo:WishlistFor83

I will try to find some time during this week (likely on the weekend) to 
also try and figure out if these items are real and if the people still 
think they can do them for 8.3 .. your additions would be most helpful.


regards,
Lukas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Mon, Jan 22, 2007 at 03:40:17PM +, Simon Riggs wrote:
 This would add a table-specific ANALYZE statement following each table's
 actions.

 It'd probably be best to put it before any index creating activities,

No, because then you'd fail to accumulate any stats on partial or
functional indexes.  There's been talk of using the presence of
multi-column indexes to guide creation of cross-column statistics, too.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Updateable cursors

2007-01-22 Thread Joshua D. Drake
Lukas Kahwe Smith wrote:
 Joshua D. Drake wrote:
 
 Great! I will put it on my, Remember to bug Arul list :)
 
 Hey Joshua,
 
 could you put this stuff here:
 http://developer.postgresql.org/index.php/Todo:WishlistFor83

Sure if you bother to unlock the page for me ;)

 
 I will try to find some time during this week (likely on the weekend) to
 also try and figure out if these items are real and if the people still
 think they can do them for 8.3 .. your additions would be most helpful.
 
 regards,
 Lukas
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
   http://www.postgresql.org/docs/faq
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Martijn van Oosterhout
On Mon, Jan 22, 2007 at 05:51:53PM +, Gregory Stark wrote:
 Actually no. A while back I did experiments to see how fast reading a file
 sequentially was compared to reading the same file sequentially but skipping
 x% of the blocks randomly. The results were surprising (to me) and depressing.
 The breakeven point was about 7%.

I asusume this means you were reading 7% of the blocks, not skipping 7%
of the blocks when you broke even?

I presume by break-even you mean it took just as long, time-wise. But
did it have the same effect on system load? If reading only 7% of the
blocks allows the drive to complete other requests more quickly then
it's beneficial, even if the vacuum takes longer.

This may be a silly thought, I'm not sure how drives handle multiple
requests...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature