date:20070122

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Martijn van Oosterhout

On Mon, Jan 22, 2007 at 05:51:53PM +, Gregory Stark wrote:
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%.

I asusume this means you were reading 7% of the blocks, not skipping 7%
of the blocks when you broke even?

I presume by break-even you mean it took just as long, time-wise. But
did it have the same effect on system load? If reading only 7% of the
blocks allows the drive to complete other requests more quickly then
it's beneficial, even if the vacuum takes longer.

This may be a silly thought, I'm not sure how drives handle multiple
requests...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Updateable cursors

2007-01-22 Thread Joshua D. Drake

Lukas Kahwe Smith wrote:
> Joshua D. Drake wrote:
> 
>> Great! I will put it on my, "Remember to bug Arul" list :)
> 
> Hey Joshua,
> 
> could you put this stuff here:
> http://developer.postgresql.org/index.php/Todo:WishlistFor83

Sure if you bother to unlock the page for me ;)

> 
> I will try to find some time during this week (likely on the weekend) to
> also try and figure out if these items are real and if the people still
> think they can do them for 8.3 .. your additions would be most helpful.
> 
> regards,
> Lukas
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>   http://www.postgresql.org/docs/faq
> 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Tom Lane

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> On Mon, Jan 22, 2007 at 03:40:17PM +, Simon Riggs wrote:
>> This would add a table-specific ANALYZE statement following each table's
>> actions.

> It'd probably be best to put it before any index creating activities,

No, because then you'd fail to accumulate any stats on partial or
functional indexes.  There's been talk of using the presence of
multi-column indexes to guide creation of cross-column statistics, too.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Updateable cursors

2007-01-22 Thread Lukas Kahwe Smith


Joshua D. Drake wrote:


Great! I will put it on my, "Remember to bug Arul" list :)


Hey Joshua,

could you put this stuff here:
http://developer.postgresql.org/index.php/Todo:WishlistFor83

I will try to find some time during this week (likely on the weekend) to 
also try and figure out if these items are real and if the people still 
think they can do them for 8.3 .. your additions would be most helpful.


regards,
Lukas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Default permissisons from schemas

2007-01-22 Thread Tom Lane

Stephen Frost <[EMAIL PROTECTED]> writes:
>   Following up on my reply to Joshua, what I'd like to propose is, for
>   comments and suggestions:

>   ALTER SCHEMA name [ [ WITH ] [ DEFAULT ] option [ ... ] ]

>   where option can be:

> { GRANT { { SELECT | INSERT | UPDATE | DELETE | RULE | REFERENCES | TRI=
> GGER | EXECUTE }=20
> [,...] | ALL [ PRIVILEGES ] }=20
> TO { role | PUBLIC } [, ...] [ WITH GRANT OPTION ]=20
> } [, ...]

>   OWNER role

This seems to ignore the problem that different types of objects have
different privileges.  E.g., if I want to grant USAGE on all sequences
that doesn't necessarily mean I want to grant USAGE on all languages.

>   When not-null the 'nspdefowner' would be the owner of all
>   objects created in the schema.

Whoa.  You are going to allow people to create objects owned by someone
else?  I don't think so ... most Unix systems have forbidden object
give-away for years, for very good reasons.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Updateable cursors

2007-01-22 Thread Joshua D. Drake

FAST PostgreSQL wrote:
> On Tue, 23 Jan 2007 15:48, Joshua D. Drake wrote:
>> FAST PostgreSQL wrote:
>>> We are trying to develop the updateable cursors functionality into
>>> Postgresql. I have given below details of the design and also issues we
>>> are facing.  Looking forward to the advice on how to proceed with these
>>> issues.
>>>
>>> Rgds,
>>> Arul Shaji
>> Would this be something that you would hope to submit for 8.3?
> 
> Yes definitely. If we can finish it before the feature freeze of course.

Great! I will put it on my, "Remember to bug Arul" list :)

Sincerely,

Joshua D. Drake

> 
> Rgds,
> Arul Shaji
> 
> 
>> Joshua D. Drake
>>
>>> 1. Introduction
>>> --
>>> This is a combined proposal and design document for adding updatable
>>> (insensitive) cursor capability to the PostgreSQL database.
>>> There have already been a couple of previous proposals since 2003 for
>>> implementing this feature so there appears to be community interest in
>>> doing so. This will enable the following constructs to be processed:
>>>
>>>
>>> UPDATE  SET value_list WHERE CURRENT OF 
>>> DELETE FROM  WHERE CURRENT OF 
>>>
>>> This has the effect of users being able to update or delete specific rows
>>> of a table, as defined by the row currently fetched into the cursor.
>>>
>>>
>>> 2. Overall Conceptual Design
>>> -
>>> The design is considered from the viewpoint of progression of a command
>>> through the various stages of processing, from changes to the file
>>> ?gram.y? to implement the actual grammar changes, through to changes in
>>> the Executor portion of the database architecture.
>>>
>>> 2.1 Changes to the Grammar
>>> --
>>> The following changes will be done to the PostgreSQL grammar:
>>>
>>> UPDATE statement has the option ?WHERE CURRENT OF ? added
>>> DELETE statement has the option ?WHERE CURRENT OF ? added
>>>
>>> The cursor_name data is held in the UpdateStmt and DeleteStmt structures
>>> and contains just the name of the cursor.
>>>
>>> The pl/pgsql grammar changes in the same manner.
>>>
>>> The word CURRENT will be added to the ScanKeywords array in keywords.c.
>>>
>>>
>>> 2.2 Changes to Affected Data Structures
>>> --
>>> The following data structures are affected by this change:
>>>
>>> Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt
>>> structures
>>>
>>> The Portal will contain a list of structures of relation ids and tuple
>>> ids relating to the tuple held in the QueryDesc structure. There will be
>>> one entry in the relation and tuple id list for each entry in the
>>> relation-list of the statement below:
>>>
>>> DECLARE  [WITH HOLD] SELECT FOR UPDATE OF 
>>>
>>> The QueryDesc structure will contain the relation id and the tuple id
>>> relating to the tuple obtained via the FETCH command so that it can be
>>> propagated back to the Portal for storage in the list described above.
>>>
>>> The UpdateStmt and DeleteStmt structures have the cursor name added so
>>> that the information is available for use in obtaining the portal
>>> structure related to the cursor previously opened via the DECLARE CURSOR
>>> request.
>>>
>>>
>>> 2.3 Changes to the SQL Parser
>>> 
>>> At present, although the FOR UPDATE clause of the DECLARE CURSOR command
>>> has been present in the grammar, it causes an error message later in the
>>> processing since cursors are currently not updatable. This now needs to
>>> change. The ?FOR UPDATE? clause has to be valid, but not the ?FOR SHARE?
>>> clause.
>>>
>>> The relation names that follow the ?FOR UPDATE? clause will be added to
>>> the rtable in the Query structure and identified by means of the rowMarks
>>> array. In the case of an updatable cursor the FOR SHARE option is not
>>> allowed therefore all entries in the rtable that are identified by the
>>> rowMarks array must relate to tables that are FOR UPDATE.
>>>
>>> In the UPDATE or DELETE statements the ?WHERE CURRENT OF ?
>>> clause results in the cursor name being placed in the UpdateStmt or
>>> DeleteStmt structure. During the processing of the functions -
>>> transformDeleteStmt() and transformUpdateStmt() - the cursor name is used
>>> to obtain a pointer to the related Portal structure and the tuple
>>> affected by the current UPDATE or DELETE statement is extracted from the
>>> Portal, where it has been placed as the result of a previous FETCH
>>> request. At this point all the information for the UPDATE or DELETE
>>> statement is available so the statements can be transformed into standard
>>> UPDATE or DELETE statements and sent for re-write/planning/execution as
>>> usual.
>>>
>>> 2.4 Changes to the Optimizer
>>> --
>>> There is a need to add a TidScan node to planning UPDATE / DELETE
>>> statements where the statements are ?UPDATE / DELETE at position?. This
>>> is to enable the tuple ids of the tuples in the tables

Re: [HACKERS] Updateable cursors

2007-01-22 Thread FAST PostgreSQL

On Tue, 23 Jan 2007 15:48, Joshua D. Drake wrote:
> FAST PostgreSQL wrote:
> > We are trying to develop the updateable cursors functionality into
> > Postgresql. I have given below details of the design and also issues we
> > are facing.  Looking forward to the advice on how to proceed with these
> > issues.
> >
> > Rgds,
> > Arul Shaji
>
> Would this be something that you would hope to submit for 8.3?

Yes definitely. If we can finish it before the feature freeze of course.

Rgds,
Arul Shaji


> Joshua D. Drake
>
> > 1. Introduction
> > --
> > This is a combined proposal and design document for adding updatable
> > (insensitive) cursor capability to the PostgreSQL database.
> > There have already been a couple of previous proposals since 2003 for
> > implementing this feature so there appears to be community interest in
> > doing so. This will enable the following constructs to be processed:
> >
> >
> > UPDATE  SET value_list WHERE CURRENT OF 
> > DELETE FROM  WHERE CURRENT OF 
> >
> > This has the effect of users being able to update or delete specific rows
> > of a table, as defined by the row currently fetched into the cursor.
> >
> >
> > 2. Overall Conceptual Design
> > -
> > The design is considered from the viewpoint of progression of a command
> > through the various stages of processing, from changes to the file
> > ?gram.y? to implement the actual grammar changes, through to changes in
> > the Executor portion of the database architecture.
> >
> > 2.1 Changes to the Grammar
> > --
> > The following changes will be done to the PostgreSQL grammar:
> >
> > UPDATE statement has the option ?WHERE CURRENT OF ? added
> > DELETE statement has the option ?WHERE CURRENT OF ? added
> >
> > The cursor_name data is held in the UpdateStmt and DeleteStmt structures
> > and contains just the name of the cursor.
> >
> > The pl/pgsql grammar changes in the same manner.
> >
> > The word CURRENT will be added to the ScanKeywords array in keywords.c.
> >
> >
> > 2.2 Changes to Affected Data Structures
> > --
> > The following data structures are affected by this change:
> >
> > Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt
> > structures
> >
> > The Portal will contain a list of structures of relation ids and tuple
> > ids relating to the tuple held in the QueryDesc structure. There will be
> > one entry in the relation and tuple id list for each entry in the
> > relation-list of the statement below:
> >
> > DECLARE  [WITH HOLD] SELECT FOR UPDATE OF 
> >
> > The QueryDesc structure will contain the relation id and the tuple id
> > relating to the tuple obtained via the FETCH command so that it can be
> > propagated back to the Portal for storage in the list described above.
> >
> > The UpdateStmt and DeleteStmt structures have the cursor name added so
> > that the information is available for use in obtaining the portal
> > structure related to the cursor previously opened via the DECLARE CURSOR
> > request.
> >
> >
> > 2.3 Changes to the SQL Parser
> > 
> > At present, although the FOR UPDATE clause of the DECLARE CURSOR command
> > has been present in the grammar, it causes an error message later in the
> > processing since cursors are currently not updatable. This now needs to
> > change. The ?FOR UPDATE? clause has to be valid, but not the ?FOR SHARE?
> > clause.
> >
> > The relation names that follow the ?FOR UPDATE? clause will be added to
> > the rtable in the Query structure and identified by means of the rowMarks
> > array. In the case of an updatable cursor the FOR SHARE option is not
> > allowed therefore all entries in the rtable that are identified by the
> > rowMarks array must relate to tables that are FOR UPDATE.
> >
> > In the UPDATE or DELETE statements the ?WHERE CURRENT OF ?
> > clause results in the cursor name being placed in the UpdateStmt or
> > DeleteStmt structure. During the processing of the functions -
> > transformDeleteStmt() and transformUpdateStmt() - the cursor name is used
> > to obtain a pointer to the related Portal structure and the tuple
> > affected by the current UPDATE or DELETE statement is extracted from the
> > Portal, where it has been placed as the result of a previous FETCH
> > request. At this point all the information for the UPDATE or DELETE
> > statement is available so the statements can be transformed into standard
> > UPDATE or DELETE statements and sent for re-write/planning/execution as
> > usual.
> >
> > 2.4 Changes to the Optimizer
> > --
> > There is a need to add a TidScan node to planning UPDATE / DELETE
> > statements where the statements are ?UPDATE / DELETE at position?. This
> > is to enable the tuple ids of the tuples in the tables relating to the
> > query to be obtained. There will need to be a new mechanism to achieve
> > this, as at present, a Tid scan is

Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Alvaro Herrera

Jim C. Nasby wrote:
> On Mon, Jan 22, 2007 at 04:24:28PM -0300, Alvaro Herrera wrote:
> > 4. Launcher will be a continuously-running process, akin to bgwriter;
> > connected to shared memory
>  
> So would it use up a database connection?

No.  It's connected to shared memory and has access to pgstats, but it's
not connected to any database so it's not counted.  You'd say it has the
same status as the bgwriter.

> > 5. Workers will be direct postmaster children; so postmaster will get
> > SIGCHLD when worker dies
> 
> As part of this I think we need to make it more obvious how all of this
> ties into max_connections. Currently, autovac ties up one of the
> super-user connections whenever it's not asleep; these changes would
> presumably mean that more of those connections could be tied up.

Sure.

> Rather than forcing users to worry about adjusting max_connections and
> superuser_reserved_connections to accommodate autovacuum, the system
> should handle it for them.
> 
> Were you planning on limiting the number of concurrent vacuum processes
> that could be running? If so, we could probably just increase superuser
> connections by that amount. If not, we might need to think of something
> else...

The fact that I'm currently narrowly focused on process handling means
that I don't want to touch scheduling at all for now, so I'm gonna make
it so that the launcher decides to launch a worker run only when no
other worker is running.  Thus only a single vacuum "thread" at any
time.  In the meantime you're welcome to think on the possible solutions
to that problem, which we'll have to attack at some point in the
(hopefully) near future ;-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Updateable cursors

2007-01-22 Thread Joshua D. Drake

FAST PostgreSQL wrote:
> We are trying to develop the updateable cursors functionality into 
> Postgresql. I have given below details of the design and also issues we are 
> facing.  Looking forward to the advice on how to proceed with these issues.
> 
> Rgds,
> Arul Shaji

Would this be something that you would hope to submit for 8.3?

Joshua D. Drake


> 
> 
> 
> 
>  
> 1. Introduction
> --
> This is a combined proposal and design document for adding updatable 
> (insensitive) cursor capability to the PostgreSQL database. 
> There have already been a couple of previous proposals since 2003 for 
> implementing this feature so there appears to be community interest in doing 
> so. This will enable the following constructs to be processed:
> 
> 
> UPDATE  SET value_list WHERE CURRENT OF 
> DELETE FROM  WHERE CURRENT OF 
> 
> This has the effect of users being able to update or delete specific rows of 
> a table, as defined by the row currently fetched into the cursor.
> 
> 
> 2. Overall Conceptual Design
> -
> The design is considered from the viewpoint of progression of a command 
> through the various stages of processing, from changes to the file ‘gram.y’ 
> to implement the actual grammar changes, through to changes in the Executor 
> portion of the database architecture.
> 
> 2.1 Changes to the Grammar
> --
> The following changes will be done to the PostgreSQL grammar:
> 
> UPDATE statement has the option ‘WHERE CURRENT OF ’ added
> DELETE statement has the option ‘WHERE CURRENT OF ’ added
> 
> The cursor_name data is held in the UpdateStmt and DeleteStmt structures and 
> contains just the name of the cursor.
> 
> The pl/pgsql grammar changes in the same manner.
> 
> The word CURRENT will be added to the ScanKeywords array in keywords.c.
> 
> 
> 2.2 Changes to Affected Data Structures
> --
> The following data structures are affected by this change: 
> 
> Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt 
> structures
> 
> The Portal will contain a list of structures of relation ids and tuple ids 
> relating to the tuple held in the QueryDesc structure. There will be one 
> entry in the relation and tuple id list for each entry in the relation-list 
> of the statement below: 
> 
> DECLARE  [WITH HOLD] SELECT FOR UPDATE OF  
> 
> The QueryDesc structure will contain the relation id and the tuple id 
> relating to the tuple obtained via the FETCH command so that it can be 
> propagated back to the Portal for storage in the list described above.
> 
> The UpdateStmt and DeleteStmt structures have the cursor name added so that 
> the information is available for use in obtaining the portal structure 
> related to the cursor previously opened via the DECLARE CURSOR request.
> 
> 
> 2.3 Changes to the SQL Parser
> 
> At present, although the FOR UPDATE clause of the DECLARE CURSOR command has 
> been present in the grammar, it causes an error message later in the 
> processing since cursors are currently not updatable. This now needs to 
> change. The ‘FOR UPDATE’ clause has to be valid, but not the ‘FOR SHARE’ 
> clause. 
> 
> The relation names that follow the ‘FOR UPDATE’ clause will be added to the 
> rtable in the Query structure and identified by means of the rowMarks array. 
> In the case of an updatable cursor the FOR SHARE option is not allowed 
> therefore all entries in the rtable that are identified by the rowMarks array 
> must relate to tables that are FOR UPDATE.
> 
> In the UPDATE or DELETE statements the ‘WHERE CURRENT OF ’ 
> clause results in the cursor name being placed in the UpdateStmt or 
> DeleteStmt structure. During the processing of the functions - 
> transformDeleteStmt() and transformUpdateStmt() - the cursor name is used to 
> obtain a pointer to the related Portal structure and the tuple affected by 
> the current UPDATE or DELETE statement is extracted from the Portal, where it 
> has been placed as the result of a previous FETCH request. At this point all 
> the information for the UPDATE or DELETE statement is available so the 
> statements can be transformed into standard UPDATE or DELETE statements and 
> sent for re-write/planning/execution as usual.
> 
> 2.4 Changes to the Optimizer
> --
> There is a need to add a TidScan node to planning UPDATE / DELETE statements 
> where the statements are ‘UPDATE / DELETE at position’. This is to enable the 
> tuple ids of the tuples in the tables relating to the query to be obtained. 
> There will need to be a new mechanism to achieve this, as at present, a Tid 
> scan is done only if there is a standard WHERE condition on update or delete 
> statements to provide Tid qualifier data.
> 
> 
> 2.5 Changes to the Executor
> ---
> There are various options that have been considered for this part of the 
> enhancement. Th

[HACKERS] Updateable cursors

2007-01-22 Thread FAST PostgreSQL

We are trying to develop the updateable cursors functionality into 
Postgresql. I have given below details of the design and also issues we are 
facing.  Looking forward to the advice on how to proceed with these issues.

Rgds,
Arul Shaji




 
1. Introduction
--
This is a combined proposal and design document for adding updatable 
(insensitive) cursor capability to the PostgreSQL database. 
There have already been a couple of previous proposals since 2003 for 
implementing this feature so there appears to be community interest in doing 
so. This will enable the following constructs to be processed:


UPDATE  SET value_list WHERE CURRENT OF 
DELETE FROM  WHERE CURRENT OF 

This has the effect of users being able to update or delete specific rows of 
a table, as defined by the row currently fetched into the cursor.


2. Overall Conceptual Design
-
The design is considered from the viewpoint of progression of a command 
through the various stages of processing, from changes to the file gram.y 
to implement the actual grammar changes, through to changes in the Executor 
portion of the database architecture.

2.1 Changes to the Grammar
--
The following changes will be done to the PostgreSQL grammar:

UPDATE statement has the option WHERE CURRENT OF  added
DELETE statement has the option WHERE CURRENT OF  added

The cursor_name data is held in the UpdateStmt and DeleteStmt structures and 
contains just the name of the cursor.

The pl/pgsql grammar changes in the same manner.

The word CURRENT will be added to the ScanKeywords array in keywords.c.


2.2 Changes to Affected Data Structures
--
The following data structures are affected by this change: 

Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt 
structures

The Portal will contain a list of structures of relation ids and tuple ids 
relating to the tuple held in the QueryDesc structure. There will be one 
entry in the relation and tuple id list for each entry in the relation-list 
of the statement below: 

DECLARE  [WITH HOLD] SELECT FOR UPDATE OF  

The QueryDesc structure will contain the relation id and the tuple id 
relating to the tuple obtained via the FETCH command so that it can be 
propagated back to the Portal for storage in the list described above.

The UpdateStmt and DeleteStmt structures have the cursor name added so that 
the information is available for use in obtaining the portal structure 
related to the cursor previously opened via the DECLARE CURSOR request.


2.3 Changes to the SQL Parser

At present, although the FOR UPDATE clause of the DECLARE CURSOR command has 
been present in the grammar, it causes an error message later in the 
processing since cursors are currently not updatable. This now needs to 
change. The FOR UPDATE clause has to be valid, but not the FOR SHARE 
clause. 

The relation names that follow the FOR UPDATE clause will be added to the 
rtable in the Query structure and identified by means of the rowMarks array. 
In the case of an updatable cursor the FOR SHARE option is not allowed 
therefore all entries in the rtable that are identified by the rowMarks array 
must relate to tables that are FOR UPDATE.

In the UPDATE or DELETE statements the WHERE CURRENT OF  
clause results in the cursor name being placed in the UpdateStmt or 
DeleteStmt structure. During the processing of the functions - 
transformDeleteStmt() and transformUpdateStmt() - the cursor name is used to 
obtain a pointer to the related Portal structure and the tuple affected by 
the current UPDATE or DELETE statement is extracted from the Portal, where it 
has been placed as the result of a previous FETCH request. At this point all 
the information for the UPDATE or DELETE statement is available so the 
statements can be transformed into standard UPDATE or DELETE statements and 
sent for re-write/planning/execution as usual.

2.4 Changes to the Optimizer
--
There is a need to add a TidScan node to planning UPDATE / DELETE statements 
where the statements are UPDATE / DELETE at position. This is to enable the 
tuple ids of the tuples in the tables relating to the query to be obtained. 
There will need to be a new mechanism to achieve this, as at present, a Tid 
scan is done only if there is a standard WHERE condition on update or delete 
statements to provide Tid qualifier data.


2.5 Changes to the Executor
---
There are various options that have been considered for this part of the 
enhancement. These are described in the sections below.

We would like to hear opinions on which option is the best way to go or if 
none of these is acceptable, any alternate ideas ?

Option 1  MVCC Via Continuous Searching of Database

The Executor is to be changed in the following ways:
1)  When the FETCH statement is executed the id

Re: [HACKERS] Piggybacking vacuum I/O

2007-01-22 Thread ITAGAKI Takahiro

Heikki Linnakangas <[EMAIL PROTECTED]> wrote:

> Vacuum is done in 3 phases:
> 1. Scan heap
> 2. Vacuum index
> 3. Vacuum heap

> A variation of the scheme would be to keep scanning pages that are in 
> cache, until the tid list reaches a predefined size, instead of keeping 
> track of which pages have already been seen. That would deal better with 
> tables with hot and cold spots, but it couldn't advance the relfrozenid 
> because there would be no guarantee that all pages are visited. Also, we 
> could start 1st phase of the next vacuum, while we're still in the 3rd 
> phase of previous one.

ISTM, it is another DSM that has a tuple-level accuracy, not a page-level.
One of the benefits is that we can skip the 1st phase of vacuum; We will
have a TID list of dead tuples at the start of vacuum, so we can start
from 2nd phase.

I have another idea for use of TID lists -- Store the TIDs after the 1st
or 2nd phase, and exit the vacuum. At the next vacuum, we will do both
the previous 3rd phase and new 1st phase at once, so that I/Os are reduced
(ndirtyblocks + nindexblocks) from (2*ndirtyblocks + nindexblocks) in
average. We've already use a similar method in vacuuming btree indexes
to collect recyclable empty pages.

I think piggybacking of I/Os are very useful. Buffer manager helps us
folding up some of I/Os, but explicit orders are more effective.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Default permissisons from schemas

2007-01-22 Thread Stephen Frost

Greetings,

* Stephen Frost ([EMAIL PROTECTED]) wrote:
> It seems unlikely that I'm going to have time at the rate things are
> going but I was hoping to take a whack at default permissions/ownership
> by schema.  Kind of a umask-type thing but for schemas instead of roles
> (though I've thought about it per role and that might also solve the
> particular problem we're having atm).

  Following up on my reply to Joshua, what I'd like to propose is, for
  comments and suggestions:

  ALTER SCHEMA name [ [ WITH ] [ DEFAULT ] option [ ... ] ]

  where option can be:

{ GRANT { { SELECT | INSERT | UPDATE | DELETE | RULE | REFERENCES | TRIGGER 
| EXECUTE } 
  [,...] | ALL [ PRIVILEGES ] } 
  TO { role | PUBLIC } [, ...] [ WITH GRANT OPTION ] 
} [, ...]

OWNER role

  pg_namespace would be modified to have two new columns, 
  nspdefowner oid, and nspdefacl aclitem[].  When NULL these would have
  no effect.  When not-null the 'nspdefowner' would be the owner of all
  objects created in the schema.  When not-null the 'nspdefacl' would be
  the initial acl for the object (modified for what grants are valid for
  the specific type of object).  These can only be changed by the schema
  owner and the 'OWNER role' must have create permissions in the schema.
  Ideally this would be checked when the ALTER SCHEMA is issued and then
  a dependency created for that.  If that's not possible today then the
  rights check would be done when an object creation is attempted,
  possibly with a fall-back to check the current user's rights.

  The defaults would be NULL for these so there would be no change in
  behaviour unless specifically asked for.

  I believe this would cover the following to-do item:
  Allow GRANT/REVOKE permissions to be inherited by objects based on
  schema permissions

  Comments?

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Stephen Frost

* Joshua D. Drake ([EMAIL PROTECTED]) wrote:
> Thought I would do a poll of what is happening in the world for 8.3. I have:

Another thing which was mentioned previously which I'd really like to
see happen (and was discussed on the list...) is replacing the Kerberos
support with GSSAPI support and adding support for SSPI.  Don't recall
who had said they were looking into working on it though..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Stephen Frost

* Joshua D. Drake ([EMAIL PROTECTED]) wrote:
> Thought I would do a poll of what is happening in the world for 8.3. I have:

It seems unlikely that I'm going to have time at the rate things are
going but I was hoping to take a whack at default permissions/ownership
by schema.  Kind of a umask-type thing but for schemas instead of roles
(though I've thought about it per role and that might also solve the
particular problem we're having atm).

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread ITAGAKI Takahiro


"Joshua D. Drake" <[EMAIL PROTECTED]> wrote:

> Thought I would do a poll of what is happening in the world for 8.3. I have:
> 
> Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
> Gavin Sherry: Bitmap Indexes (on disk), possible basic Window functions
> Jonah Harris: WITH/Recursive Queries?
> Andrei Kovalesvki: Some Win32 work with Magnus
> Magnus Hagander: VC++ support (thank goodness)
> Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
> Oleg Bartunov: Tsearch2 in core
> Neil Conway: Patch Review (including enums), pg_fcache

I'm working on Dead Space Map and Load-distribution of checkpoints.
I will make it do by 8.3.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Jeff Davis

On Mon, 2007-01-22 at 14:16 -0800, Joshua D. Drake wrote:
> I am sure there are more, the ones with question marks are unknowns but
> heard of in the ether somewhere. Any additions or confirmations?
> 

I'd still like to make an attempt at my Synchronized Scanning patch.

If freeze is 10 weeks away, I better get some more test results posted
soon, however. 

Regards,
Jeff Davis

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] Would this SPI function be useful?

2007-01-22 Thread Jeff Davis


Would it be useful to have an SPI function that returns the OID and
namespace of the function being executed?

The reason I bring this up is due to a discussion on the PostGIS lists
about making the installation able to work in any namespace from one
generic SQL script.

The problem they have is that, because the functions don't know what
schema they reside in, they don't know how to call out to other
functions in the same namespace.

It might be useful to have a few basic functions that allow developers
of add-ons to access information like that. What come to my mind are:

SPI_get_my_oid();
SPI_get_my_nspname();

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Strange file in snapshot tarball

2007-01-22 Thread ITAGAKI Takahiro

Michael Meskes <[EMAIL PROTECTED]> wrote:

> > There is a file 'compat_informix-dec_test-OpenBSD3.8.broken.stdout'
> > under the path 'src/interfaces/ecpg/test/extedted/' in the recent
> > ftp snapshot (postgresql-snapshot.tar.gz).
> 
> Would you please care to explain? I do not understand what you are
> saying. 

Oops, sorry. It's my extractor's failure.
The file is the only file that name is longer than 100 characters
in the tarball. My extractor does not support long name files.

I'm sorry for all the fuss.

---
ITAGAKI Takahiro
NTT Open Source Software Center

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] -f option for pg_dumpall

2007-01-22 Thread Andrew Dunstan

elein wrote:
> On Mon, Jan 15, 2007 at 10:13:16AM -0500, Andrew Dunstan wrote:
>>
>>
>> Neil Conway wrote:
>> >On Thu, 2007-01-11 at 14:36 -0500, Neil Conway wrote:
>> >
>> >>I don't think they need to be integrated any time soon, but if we were
>> >>to design pg_dump and pg_dumpall from scratch, it seems more logical
>> to
>> >>use a single program
>> >>
>> >
>> >On thinking about this some more, it might be useful to factor much of
>> >pg_dump's logic for reconstructing the state of a database into a
>> shared
>> >library. This would make it relatively easy for developers to plug new
>> >archive formats into the library (in addition to the present 3 archive
>> >formats), or to make use of this functionality in other applications
>> >that want to reconstruct the logical state of a database from the
>> >content of the system catalogs. We could then provide a client app
>> >implemented on top of the library that would provide similar
>> >functionality to pg_dump.
>> >
>> >Moving pg_dump's functionality into the backend has been suggested in
>> >the past (and rejected for good reason), but I think this might be a
>> >more practical method for making the pg_dump logic more easily
>> reusable.
>> >
>> >
>> >
>>
>> I like this idea. For example, we might usefully map some of this to
>> psql \ commands, without having to replicate the underlying logic.
>
> Don't we already do this with the .psqlrc file?
>

No. \ commands are implemented in C code.

cheers

andrew


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] -f option for pg_dumpall

2007-01-22 Thread elein

On Mon, Jan 15, 2007 at 10:13:16AM -0500, Andrew Dunstan wrote:
> 
> 
> Neil Conway wrote:
> >On Thu, 2007-01-11 at 14:36 -0500, Neil Conway wrote:
> >  
> >>I don't think they need to be integrated any time soon, but if we were
> >>to design pg_dump and pg_dumpall from scratch, it seems more logical to
> >>use a single program
> >>
> >
> >On thinking about this some more, it might be useful to factor much of
> >pg_dump's logic for reconstructing the state of a database into a shared
> >library. This would make it relatively easy for developers to plug new
> >archive formats into the library (in addition to the present 3 archive
> >formats), or to make use of this functionality in other applications
> >that want to reconstruct the logical state of a database from the
> >content of the system catalogs. We could then provide a client app
> >implemented on top of the library that would provide similar
> >functionality to pg_dump.
> >
> >Moving pg_dump's functionality into the backend has been suggested in
> >the past (and rejected for good reason), but I think this might be a
> >more practical method for making the pg_dump logic more easily reusable.
> >
> >
> >  
> 
> I like this idea. For example, we might usefully map some of this to 
> psql \ commands, without having to replicate the underlying logic.

Don't we already do this with the .psqlrc file?

--elein


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread korryd



> Thought I would do a poll of what is happening in the world for 8.3. I have:
> 
> Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
> Gavin Sherry: Bitmap Indexes (on disk), possible basic Window functions
> Jonah Harris: WITH/Recursive Queries?
> Andrei Kovalesvki: Some Win32 work with Magnus
> Magnus Hagander: VC++ support (thank goodness)
> Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
> Oleg Bartunov: Tsearch2 in core
> Neil Conway: Patch Review (including enums), pg_fcache



Korry Douglas: PL/pgSQL debugger (and probably a PL/pgSQL execution profiler as 
well)




--
  Korry Douglas[EMAIL PROTECTED]
  EnterpriseDB  http://www.enterprisedb.com

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Steve Atkins



On Jan 22, 2007, at 11:16 AM, Richard Huxton wrote:


Bruce Momjian wrote:
Yep, agreed on the random I/O issue.  The larger question is if  
you have
a huge table, do you care to reclaim 3% of the table size, rather  
than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is  
going
to take a lot of time, but vacuuming to relaim 3% three times  
seems like
it is going to be more expensive than just vacuuming the 10%  
once.  And

vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.


Buying a house with a 25-year mortgage is much more expensive than  
just paying cash too, but you don't always have a choice.


Surely the key benefit of the partial vacuuming thing is that you  
can at least do something useful with a large table if a full  
vacuum takes 24 hours and you only have 4 hours of idle I/O.


It's also occurred to me that all the discussion of scheduling way  
back when isn't directly addressing the issue. What most people  
want (I'm guessing) is to vacuum *when the user-workload allows*  
and the time-tabling is just a sysadmin first-approximation at that.


Yup. I'd really like for my app to be able to say "Hmm. No  
interactive users at the moment, no critical background tasks. Now  
would be a really good time for the DB to do some maintenance." but  
also to be able to interrupt the maintenance process if some new  
users or other system load show up.


With partial vacuuming possible, we can arrange things with just  
three thresholds and two measurements:

  Measurement 1 = system workload
  Measurement 2 = a per-table "requires vacuuming" value
  Threshold 1 = workload at which we do more vacuuming
  Threshold 2 = workload at which we do less vacuuming
  Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to  
the thresholds and starts a new vacuum, kills one or does nothing.  
New vacuum processes keep getting started as long as there is  
workload spare and tables that need vacuuming.


Now the trick of course is how you measure system workload in a  
meaningful manner.


I'd settle for a "start maintenance", "stop maintenance" API.  
Anything else (for instance the heuristics you suggest above) would  
definitely be gravy.


It's not going to be simple to do, though, I don't think.

Cheers,
  Steve


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Jim C. Nasby

On Mon, Jan 22, 2007 at 04:24:28PM -0300, Alvaro Herrera wrote:
> 4. Launcher will be a continuously-running process, akin to bgwriter;
> connected to shared memory

So would it use up a database connection?

> 5. Workers will be direct postmaster children; so postmaster will get
> SIGCHLD when worker dies

As part of this I think we need to make it more obvious how all of this
ties into max_connections. Currently, autovac ties up one of the
super-user connections whenever it's not asleep; these changes would
presumably mean that more of those connections could be tied up.

Rather than forcing users to worry about adjusting max_connections and
superuser_reserved_connections to accommodate autovacuum, the system
should handle it for them.

Were you planning on limiting the number of concurrent vacuum processes
that could be running? If so, we could probably just increase superuser
connections by that amount. If not, we might need to think of something
else...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [pgsql-patches] [HACKERS] Win32 WEXITSTATUS too

2007-01-22 Thread Bruce Momjian

bruce wrote:
> Tom Lane wrote:
> > Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > > Bruce Momjian wrote:
> > >> OK, maybe /doc or src/tools.  A more radical approach would be to put
> > >> the list in our documentation, or have initdb install it.
> > 
> > > Why not put it in techdocs or some such?
> > 
> > I think we've learned by now that putting copies of other peoples' code
> > in our tree isn't such a hot idea; what is going to cause it to be
> > updated when things change?  How do you know the values are even the
> > same across all the Windows versions we support?
> > 
> > Basically this whole idea is misconceived.  Just print the number and
> > have done.
> 
> And how do people interpret that number?

Ah, I found something:

http://support.microsoft.com/kb/259693

Someone on IRC says that is kernel mode only, and is looking for a
user-mode version, so we would be able to print out a meaningful message
rather than a hex value that has to be looked up.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Jim C. Nasby

On Mon, Jan 22, 2007 at 03:40:17PM +, Simon Riggs wrote:
> This would add a table-specific ANALYZE statement following each table's
> actions.

It'd probably be best to put it before any index creating activities,
since there's a better chance of everything from the table being in
shared buffers.

Better yet would be if COPY could analyze data as it was loaded in...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Jim C. Nasby

On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
> Gregory Stark wrote:
> > 
> > Actually no. A while back I did experiments to see how fast reading a file
> > sequentially was compared to reading the same file sequentially but skipping
> > x% of the blocks randomly. The results were surprising (to me) and 
> > depressing.
> > The breakeven point was about 7%. [...]
> > 
> > The theory online was that as long as you're reading one page from each disk
> > track you're going to pay the same seek overhead as reading the entire 
> > track.
> 
> Could one take advantage of this observation in designing the DSM?
> 
> Instead of a separate bit representing every page, having each bit
> represent 20 or so pages might be a more useful unit.  It sounds
> like the time spent reading would be similar; while the bitmap
> would be significantly smaller.

If we extended relations by more than one page at a time we'd probably
have a better shot at the blocks on disk being contiguous and all read
at the same time by the OS.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Piggybacking vacuum I/O

2007-01-22 Thread Jim C. Nasby

On Mon, Jan 22, 2007 at 02:51:47PM +, Heikki Linnakangas wrote:
> I've been looking at the way we do vacuums.
> 
> The fundamental performance issue is that a vacuum generates 
> nheapblocks+nindexblocks+ndirtyblocks I/Os. Vacuum cost delay helps to 
> spread the cost like part payment, but the total is the same. In an I/O 
> bound system, the extra I/O directly leads to less throughput.
> 
> Therefore, we need to do less I/O. Dead space map helps by allowing us 
> to skip blocks that don't need vacuuming, reducing the # of I/Os to 
> 2*ndirtyblocks+nindexblocks. That's great, but it doesn't help us if the 
> dead tuples are spread uniformly.
> 
> If we could piggyback the vacuum I/Os to the I/Os that we're doing 
> anyway, vacuum wouldn't ideally have to issue any I/O of its own. I've 
> tried to figure out a way to do that.
> 
> Vacuum is done in 3 phases:
> 
> 1. Scan heap
> 2. Vacuum index
> 3. Vacuum heap
 


> Instead of doing a sequential scan, we could perform the 1st phase by 
> watching the buffer pool, scanning blocks for dead tuples when they're 
> in memory and keeping track of which pages we've seen. When all pages 
> have been seen, the tid list is sorted and 1st phase is done.
> 
> In theory, the index vacuum could also be done that way, but let's 
> assume for now that indexes would be scanned like they are currently.
> 
> The 3rd phase can be performed similarly to the 1st phase. Whenever a 
> page enters the buffer pool, we check the tid list and remove any 
> matching tuples from the page. When the list is empty, vacuum is complete.

Is there any real reason to demark the start and end of a vacuum? Why
not just go to a continuous process? One possibility is to keep a list
of TIDs for each phase, though that could prove tricky with multiple
indexes.

> A variation of the scheme would be to keep scanning pages that are in 
> cache, until the tid list reaches a predefined size, instead of keeping 
> track of which pages have already been seen. That would deal better with 
> tables with hot and cold spots, but it couldn't advance the relfrozenid 
> because there would be no guarantee that all pages are visited. Also, we 
> could start 1st phase of the next vacuum, while we're still in the 3rd 
> phase of previous one.
 
What if we tracked freeze status on a per-page basis? Perhaps track the
minimum XID that's on each page. That would allow us to ensure that we
freeze pages that are approaching XID wrap.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] send password to pg_dump

2007-01-22 Thread Andrew Dunstan


shaunc wrote:

Hello,

I'm trying to run pg_dump programmatically, and I'm wondering how I can send
it a password.

I'm running it with a system() call in a child process, and sending the
password in from the parent process on stdin, but somehow pg_dump always
finds out how to ask my terminal for a password. How do I fool it?

(I need this to work in linux... in python would be preferable too.)


  


1. this is the wrong forum for this question - here we discuss 
postgresql developments, not usage
2. use a pgpass file, possibly with a PGPASSFILE environment setting as 
well - see the docs for details. This works on all platforms and should 
work with anything at all calling pg_dump.
3. In general, the only way to "fool" programs properly this way is to 
use expect or some equivalent that uses ptys. In this case, it would be 
overkill.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Oleg Bartunov


On Mon, 22 Jan 2007, Joshua D. Drake wrote:


Or so... :)

Thought I would do a poll of what is happening in the world for 8.3. I have:

Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
Gavin Sherry: Bitmap Indexes (on disk), possible basic Window functions
Jonah Harris: WITH/Recursive Queries?
Andrei Kovalesvki: Some Win32 work with Magnus
Magnus Hagander: VC++ support (thank goodness)
Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
Oleg Bartunov: Tsearch2 in core


Teodor Sigaev should be here !


Neil Conway: Patch Review (including enums), pg_fcache

Vertical projects:

Pavel Stehule: PLpsm
Alexey Klyukin: PLphp
Andrei Kovalesvki: ODBCng

I am sure there are more, the ones with question marks are unknowns but
heard of in the ether somewhere. Any additions or confirmations?

Sincerely,

Joshua D. Drake






Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] send password to pg_dump

2007-01-22 Thread shaunc


Hello,

I'm trying to run pg_dump programmatically, and I'm wondering how I can send
it a password.

I'm running it with a system() call in a child process, and sending the
password in from the parent process on stdin, but somehow pg_dump always
finds out how to ask my terminal for a password. How do I fool it?

(I need this to work in linux... in python would be preferable too.)

Thanks for any advice.

- Shaun
-- 
View this message in context: 
http://www.nabble.com/send-password-to-pg_dump-tf3048210.html#a8473174
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [BUGS] BUG #2907: pg_get_serial_sequence quoting

2007-01-22 Thread Adriaan van Os


Tom Lane wrote:

Bruce Momjian <[EMAIL PROTECTED]> writes:

I presume the reason for that is that the first paramater can be
qualified:
select pg_get_serial_sequence('"public"."FOO"', 'Ff1');



Would someone explain why qualification makes us lowercase the first
parameter by default?  I don't understand it well enough to document it.


The point is that we have to parse the first parameter, whereas the
second one can be taken literally.


It still looks inconsistent and ugly. I think the design mistake of pg_get_serial_sequence is that 
it takes two parameters rather than one (a fully qualified doublequoted columnname path) or three 
(optionally empty schema, tablename, columnname, all three literal).


Regards,

Adriaan van Os

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Kenneth Marshall

On Mon, Jan 22, 2007 at 07:24:20PM +, Heikki Linnakangas wrote:
> Kenneth Marshall wrote:
> >On Mon, Jan 22, 2007 at 06:42:09PM +, Simon Riggs wrote:
> >>Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
> >
> >There may be other functions that could leverage a similar sort of
> >infrastructure. For example, a long DB mining query could be registered
> >with the system. Then as the pieces of the table/database are brought in
> >to shared memory during the normal daily DB activity they can be acquired
> >without forcing the DB to run a very I/O expensive query when waiting a
> >bit for the results would be acceptable. As long as we are thinking
> >piggyback.
> 
> Yeah, I had the same idea when we discussed synchronizing sequential 
> scans. The biggest difference is that with queries, there's often a user 
> waiting for the query to finish, but with vacuum we don't care so much 
> how long it takes.
> 
Yes, but with trending and statistical analysis you may not need the
exact answer ASAP. An approximate answer based on a fraction of the
information would be useful. Also, "what if" queries could be run without
impacting the production uses of a database. One might imagine having a
query with results that "converge" as the table is processed during normal
use.

Ken

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Piggybacking vacuum I/O

2007-01-22 Thread Simon Riggs

On Mon, 2007-01-22 at 13:41 +, Heikki Linnakangas wrote:
> Any thoughts before I start experimenting?

Probably only to detail the various use cases we are discussing.

My thoughts on various use cases are:

- small table with frequent update/delete, heap and indexes all/mostly
cached
e.g. Counter tables, DBT2: District/Warehouse TPC-C, pgbench:
Branches/Tellers
Current VACUUM works well in this situation, since the only I/O incurred
is the WAL written for the VACUUM. VACUUM very cheap even if not in
cache because of sequential I/O. Keeping track of whether there are hot
spots in these tables seems like a waste of cycles and could potentially
introduce contention and hence reduce performance. These need to be very
frequently VACUUMed, even when other VACUUMs are required.
My current view: just need multiple concurrent autovacuum processes.

- large table with severe hotspots
e.g. DBT2: NewOrder, larger queue-style tables
The hotspots are likely to be in cache and the not-so-hotspots might or
might not be in cache, but we don't care either way. DSM concept works
well for this case, since we are able to avoid lots of I/O by
appropriate book-keeping. Works well for removing rows after a file-scan
DELETE, as well as for DELETE or UPDATE hot spots. 
My current view: DSM would be great for this

- large table with few hotspots
e.g. DBT2: Stock, pgbench: Accounts, most Customer tables
Current VACUUM works very badly in this case, since updates are sparsely
distributed across table. DSM wouldn't help either unless we
differentiate between few/many updates to a block. 
My current view: Piggyback concept seems on the right track, but clearly
needs further thought.

Currently we have only one technique for garbage collection, plus one
process to perform it. We need multiple techniques executed by multiple
processes, when required, plus some way of automatically selecting which
is appropriate depending upon the use case. Yes, automatic :-)

DSM and this piggyback idea need not be thought of as competing
techniques.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Kenneth Marshall

On Mon, Jan 22, 2007 at 06:42:09PM +, Simon Riggs wrote:
> On Mon, 2007-01-22 at 13:27 -0500, Bruce Momjian wrote:
> > Yep, agreed on the random I/O issue.  The larger question is if you have
> > a huge table, do you care to reclaim 3% of the table size, rather than
> > just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> > to take a lot of time, but vacuuming to relaim 3% three times seems like
> > it is going to be more expensive than just vacuuming the 10% once.  And
> > vacuuming to reclaim 1% ten times seems even more expensive.  The
> > partial vacuum idea is starting to look like a loser to me again.
> 
> Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
> 
> -- 
>   Simon Riggs 
>   EnterpriseDB   http://www.enterprisedb.com
> 

There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.

Ken

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Jonah H. Harris


On 1/22/07, Joshua D. Drake <[EMAIL PROTECTED]> wrote:

Jonah Harris: WITH/Recursive Queries?


Yup, just talked with Bruce about this last week.  Working on the design now.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-22 Thread Joshua D. Drake

Or so... :)

Thought I would do a poll of what is happening in the world for 8.3. I have:

Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
Gavin Sherry: Bitmap Indexes (on disk), possible basic Window functions
Jonah Harris: WITH/Recursive Queries?
Andrei Kovalesvki: Some Win32 work with Magnus
Magnus Hagander: VC++ support (thank goodness)
Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
Oleg Bartunov: Tsearch2 in core
Neil Conway: Patch Review (including enums), pg_fcache

Vertical projects:

Pavel Stehule: PLpsm
Alexey Klyukin: PLphp
Andrei Kovalesvki: ODBCng

I am sure there are more, the ones with question marks are unknowns but
heard of in the ether somewhere. Any additions or confirmations?

Sincerely,

Joshua D. Drake



-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] autovacuum process handling

2007-01-22 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

This is how I think autovacuum should change with an eye towards being
able to run multiple vacuums simultaneously:


[snip details]


Does this raise some red flags?  It seems straightforward enough to me;
I'll submit a patch implementing this, so that scheduling will continue
to be as it is today.  Thus the scheduling discussions are being
deferred until they can be actually useful and implementable.


I can't really speak to the PostgreSQL signaling innards, but this sound 
logical to me.  I think having the worker processes be children of the 
postmaster and having them be single-minded (or single-tasked) also 
makes a lot of sense.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Joris Dobbelsteen

>-Original Message-
>From: [EMAIL PROTECTED] 
>[mailto:[EMAIL PROTECTED] On Behalf Of Gregory Stark
>Sent: maandag 22 januari 2007 19:41
>To: Bruce Momjian
>Cc: Heikki Linnakangas; Russell Smith; Darcy Buskermolen; 
>Simon Riggs; Alvaro Herrera; Matthew T. O'Connor; Pavan 
>Deolasee; Christopher Browne; pgsql-general@postgresql.org; 
>pgsql-hackers@postgresql.org
>Subject: Re: [HACKERS] [GENERAL] Autovacuum Improvements
>
>"Bruce Momjian" <[EMAIL PROTECTED]> writes:
>
>> Yep, agreed on the random I/O issue.  The larger question is if you 
>> have a huge table, do you care to reclaim 3% of the table 
>size, rather 
>> than just vacuum it when it gets to 10% dirty?  I realize the vacuum 
>> is going to take a lot of time, but vacuuming to relaim 3% 
>three times 
>> seems like it is going to be more expensive than just vacuuming the 
>> 10% once.  And vacuuming to reclaim 1% ten times seems even more 
>> expensive.  The partial vacuum idea is starting to look like 
>a loser to me again.
>
>Well the answer is of course "that depends".
>
>If you maintain the dead space at a steady state averaging 
>1.5% instead of 5% your table is 3.33% smaller on average. If 
>this is a DSS system that will translate into running your 
>queries 3.33% faster. It will take a lot of vacuums before 
>they hurt more than a 3%+ performance drop.

Good, this means a DSS system will mostly do table scans (right?). So
probably you should witness the 'table scan' statistic and rows fetched
aproaching the end of the universe (at least compared to
inserts/updates/deletes)?

>If it's an OLTP system the it's harder to figure. a 3.33% 
>increase in data density will translate to a higher cache hit 
>rate but how much higher depends on a lot of factors. In our 
>experiments we actually got bigger boost in these kinds of 
>situations than the I expected (I expected comparable to the 
>3.33% improvement). So it could be even more than 3.33%. But 
>like said it depends.
>If you already have the whole database cache you won't see any 
>improvement. If you are right on the cusp you could see a huge benefit.

These tables have high insert, update and delete rates, probably a lot
of index scans? I believe the workload on table scans should be (close
to) none.

Are you willing to share some of this measured data? I'm quite
interested in such figures.

>It sounds like you're underestimating the performance drain 
>10% wasted space has. If we found out that one routine was 
>unnecessarily taking 10% of the cpu time it would be an 
>obvious focus of attention. 10% wasted space is going to work 
>out to about 10% of the i/o time.
>
>It also sounds like we're still focused on the performance 
>impact in absolute terms. I'm much more interested in changing 
>the performance characteristics so they're predictable and 
>scalable. It doesn't matter much if your 1kb table is 100% 
>slower than necessary but it does matter if your 1TB table 
>needs 1,000x as much vacuuming as your 1GB table even if it's 
>getting the same update traffic.

Or rather, the vacuuming should pay back.
A nice metric might be: cost_of_not_vacuuming / cost_of_vacuuming.
Obviously, the higher the better.

- Joris Dobbelsteen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Ron Mayer

Gregory Stark wrote:
> 
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%. [...]
> 
> The theory online was that as long as you're reading one page from each disk
> track you're going to pay the same seek overhead as reading the entire track.

Could one take advantage of this observation in designing the DSM?

Instead of a separate bit representing every page, having each bit
represent 20 or so pages might be a more useful unit.  It sounds
like the time spent reading would be similar; while the bitmap
would be significantly smaller.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian

Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > 
> > Yep, agreed on the random I/O issue.  The larger question is if you have
> > a huge table, do you care to reclaim 3% of the table size, rather than
> > just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> > to take a lot of time, but vacuuming to relaim 3% three times seems like
> > it is going to be more expensive than just vacuuming the 10% once.  And
> > vacuuming to reclaim 1% ten times seems even more expensive.  The
> > partial vacuum idea is starting to look like a loser to me again.
> 
> But if the partial vacuum is able to clean the busiest pages and reclaim
> useful space, currently-running transactions will be able to use that
> space and thus not have to extend the table.  Not that extension is a
> problem on itself, but it'll keep your working set smaller.

Yes, but my point is that if you are trying to avoid vacuuming the
table, I am afraid the full index scan is going to be painful too.  I
can see corner cases where partial vacuum is a win (I only have 4 hours
of idle I/O), but for the general case I am still worried that partial
vacuum will not be that useful as long as we have to scan the indexes.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas


Kenneth Marshall wrote:

On Mon, Jan 22, 2007 at 06:42:09PM +, Simon Riggs wrote:

Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...


There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.


Yeah, I had the same idea when we discussed synchronizing sequential 
scans. The biggest difference is that with queries, there's often a user 
waiting for the query to finish, but with vacuum we don't care so much 
how long it takes.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] autovacuum process handling

2007-01-22 Thread Alvaro Herrera

Hi,

This is how I think autovacuum should change with an eye towards being
able to run multiple vacuums simultaneously:

1. There will be two kinds of processes, "autovacuum launcher" and
"autovacuum worker".

2. The launcher will be in charge of scheduling and will tell workers
what to do

3. The workers will be similar to what autovacuum does today: start when
somebody else tells it to start, process a single item (be it a table or
a database) and terminate

4. Launcher will be a continuously-running process, akin to bgwriter;
connected to shared memory

5. Workers will be direct postmaster children; so postmaster will get
SIGCHLD when worker dies

6. Launcher will start a worker using the following protocol:
   - Set up information on what to run on shared memory
   - invoke SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER)
   - Postmaster will react by starting a worker, and registering it very
 similarly to a regular backend, so it can be shut down easily when
 appropriate.
 (Thus launcher will not be informed right away when worker dies)
   - Worker will examine shared memory to know what to do, clear the
 request, and send a signal to Launcher
   - Launcher wakes up and can start another one if appropriate

Does this raise some red flags?  It seems straightforward enough to me;
I'll submit a patch implementing this, so that scheduling will continue
to be as it is today.  Thus the scheduling discussions are being
deferred until they can be actually useful and implementable.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/22/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

> i think so...Martijn said it best: you can 'rollback' to, but you
> can't 'commit' to.  The 'commit to' would be the arguably much more
> useful way of disposing of a savepoint.  But that should be taken up
> with sql standards committee :(.

You can RELEASE a savepoint though.

not following an error.  RELEASE serves absolutely no purpose
whatsoever. it's like the sql equivalent of an assembly NOP...wasts
cpu cycles for no reason.

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas


Gregory Stark wrote:

"Bruce Momjian" <[EMAIL PROTECTED]> writes:


I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.


Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.


Note that with uniformly random updates, you have dirtied every page of 
the table until you get anywhere near 5% of dead space. So we have to 
assume non-uniform distribution of update for the DSM to be of any help.


And if we assume non-uniform distribution, it's a good bet that the 
blocks that need vacuuming are also not randomly distributed. In fact, 
they might very well all be in one cluster, so that scanning that 
cluster is indeed sequential I/O.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Richard Huxton


Bruce Momjian wrote:

Yep, agreed on the random I/O issue.  The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once.  And
vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.


Buying a house with a 25-year mortgage is much more expensive than just 
paying cash too, but you don't always have a choice.


Surely the key benefit of the partial vacuuming thing is that you can at 
least do something useful with a large table if a full vacuum takes 24 
hours and you only have 4 hours of idle I/O.


It's also occurred to me that all the discussion of scheduling way back 
when isn't directly addressing the issue. What most people want (I'm 
guessing) is to vacuum *when the user-workload allows* and the 
time-tabling is just a sysadmin first-approximation at that.


With partial vacuuming possible, we can arrange things with just three 
thresholds and two measurements:

  Measurement 1 = system workload
  Measurement 2 = a per-table "requires vacuuming" value
  Threshold 1 = workload at which we do more vacuuming
  Threshold 2 = workload at which we do less vacuuming
  Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to the 
thresholds and starts a new vacuum, kills one or does nothing. New 
vacuum processes keep getting started as long as there is workload spare 
and tables that need vacuuming.


Now the trick of course is how you measure system workload in a 
meaningful manner.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Alvaro Herrera

Merlin Moncure wrote:
> On 1/22/07, Dawid Kuroczko <[EMAIL PROTECTED]> wrote:

> >one for each INSERT+UPDATE block.  This way eiher both of them succeed
> >or fail, within one transaction.
> 
> i think so...Martijn said it best: you can 'rollback' to, but you
> can't 'commit' to.  The 'commit to' would be the arguably much more
> useful way of disposing of a savepoint.  But that should be taken up
> with sql standards committee :(.

You can RELEASE a savepoint though.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Alvaro Herrera

Bruce Momjian wrote:
> 
> Yep, agreed on the random I/O issue.  The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once.  And
> vacuuming to reclaim 1% ten times seems even more expensive.  The
> partial vacuum idea is starting to look like a loser to me again.

But if the partial vacuum is able to clean the busiest pages and reclaim
useful space, currently-running transactions will be able to use that
space and thus not have to extend the table.  Not that extension is a
problem on itself, but it'll keep your working set smaller.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/22/07, Dawid Kuroczko <[EMAIL PROTECTED]> wrote:

On 1/22/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> > The short version is I would like the ability to run some sql commands
> > and recover the transaction if an error occurs.
>
> I'm getting tired of repeating this, but: neither of you have said
> anything that doesn't appear to me to be handled by ON_ERROR_ROLLBACK.
> What exactly is lacking in that feature?

I think the problem is with doing something like this:

BEGIN;
INSERT INTO foo VALUES ('1'); UPDATE status SET updated=now() WHERE tab='foo';
INSERT INTO bar VALUES ('2'); UPDATE status SET updated=now() WHERE tab='bar';
INSERT INTO baz VALUES ('3'); UPDATE status SET updated=now() WHERE tab='baz';
COMMIT;

This will issue three savepoints (if I understand how things wok correctly),

yes

one for each INSERT+UPDATE block.  This way eiher both of them succeed
or fail, within one transaction.

i think so...Martijn said it best: you can 'rollback' to, but you
can't 'commit' to.  The 'commit to' would be the arguably much more
useful way of disposing of a savepoint.  But that should be taken up
with sql standards committee :(.

One solution would be a psql command which
would fire given command on error condition, like:

yes, psql can handle this. while (IMO) a hack, it addresses the
shortcoming (scripting) specifically not handled by savepoints..

merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure


On 1/22/07, Martijn van Oosterhout  wrote:

I don't understand this either. Everything you can do with nested
transactions you can also do with savepoints, so I'm really not
understand what the limitations are?

Actually, looking at the savepoint documentation, it looks like there
is no way to say:

if transaction_state ok then
  release X
else
  rollback to X


exactly.


Which is what a normal COMMIT does (sort of). This is very irritating
for scripting, so maybe a "COMMIT TO X" command would be auseful
addition?


right.  thats exactly what I want (more or less, there are a couple of
different ways to do it, but this is perfectly acceptable).  The on
errors bit was just a froofy addition that distracted from the core
problem.

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Simon Riggs

On Mon, 2007-01-22 at 13:27 -0500, Bruce Momjian wrote:
> Yep, agreed on the random I/O issue.  The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once.  And
> vacuuming to reclaim 1% ten times seems even more expensive.  The
> partial vacuum idea is starting to look like a loser to me again.

Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Gregory Stark

"Bruce Momjian" <[EMAIL PROTECTED]> writes:

> Yep, agreed on the random I/O issue.  The larger question is if you have
> a huge table, do you care to reclaim 3% of the table size, rather than
> just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
> to take a lot of time, but vacuuming to relaim 3% three times seems like
> it is going to be more expensive than just vacuuming the 10% once.  And
> vacuuming to reclaim 1% ten times seems even more expensive.  The
> partial vacuum idea is starting to look like a loser to me again.

Well the answer is of course "that depends".

If you maintain the dead space at a steady state averaging 1.5% instead of 5%
your table is 3.33% smaller on average. If this is a DSS system that will
translate into running your queries 3.33% faster. It will take a lot of
vacuums before they hurt more than a 3%+ performance drop.

If it's an OLTP system the it's harder to figure. a 3.33% increase in data
density will translate to a higher cache hit rate but how much higher depends
on a lot of factors. In our experiments we actually got bigger boost in these
kinds of situations than the I expected (I expected comparable to the 3.33%
improvement). So it could be even more than 3.33%. But like said it depends.
If you already have the whole database cache you won't see any improvement. If
you are right on the cusp you could see a huge benefit.

It sounds like you're underestimating the performance drain 10% wasted space
has. If we found out that one routine was unnecessarily taking 10% of the cpu
time it would be an obvious focus of attention. 10% wasted space is going to
work out to about 10% of the i/o time.

It also sounds like we're still focused on the performance impact in absolute
terms. I'm much more interested in changing the performance characteristics so
they're predictable and scalable. It doesn't matter much if your 1kb table is
100% slower than necessary but it does matter if your 1TB table needs 1,000x
as much vacuuming as your 1GB table even if it's getting the same update
traffic.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Dawid Kuroczko

On 1/22/07, Tom Lane <[EMAIL PROTECTED]> wrote:

> The short version is I would like the ability to run some sql commands
> and recover the transaction if an error occurs.

I'm getting tired of repeating this, but: neither of you have said
anything that doesn't appear to me to be handled by ON_ERROR_ROLLBACK.
What exactly is lacking in that feature?

I think the problem is with doing something like this:

BEGIN;
INSERT INTO foo VALUES ('1'); UPDATE status SET updated=now() WHERE tab='foo';
INSERT INTO bar VALUES ('2'); UPDATE status SET updated=now() WHERE tab='bar';
INSERT INTO baz VALUES ('3'); UPDATE status SET updated=now() WHERE tab='baz';
COMMIT;

This will issue three savepoints (if I understand how things wok correctly),
one for each INSERT+UPDATE block.  This way eiher both of them succeed
or fail, within one transaction.

Now, I think the problem the OP wanted to solve was that keeping command
on one line just to have them "inside" one savepoint, and depending on psql(1)
to issue rollbacks for us.  I think OPs idea was to be able to rollback if error
occured:
BEGIN;
SAVEPOINT s1;
INSERT...
UPDATE...
ROLLBACK TO s1 ON ERROR;
INSERT..
UPDATE...
ROLLBACK TO s2 ON ERROR;
UPDATE job SET ts = now(); -- OK
COMMIT; -- notice lack of rollback -- whole transaction will fail on error

One solution would be a psql command which
would fire given command on error condition, like:
BEGIN;
SAVEPOINT s1;
INSERT...
UPDATE...
\on_error ROLLBACK TO s1; INSERT INTO errors 
SAVEPOINT s2;

COMMIT;

  Regards,
 Dawid

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Martijn van Oosterhout

On Mon, Jan 22, 2007 at 11:21:12AM -0500, Merlin Moncure wrote:
> >\begin_nest
> >\commit_nest
> >\rollback_nest
> 
> That would work if we could rollback conditionally on failure (like
> on_error_rollback but with definable beginning and ending points).  I

Sorry, "rollback conditionally on failure" isn't parsing for me. Can
you give some example of what you mean?

> still think we are hacking around limitations of savepoints but it
> would solve the scripting problem at least.  A general implementation
> on the server would benefit everybody.

I don't understand this either. Everything you can do with nested
transactions you can also do with savepoints, so I'm really not
understand what the limitations are?

Actually, looking at the savepoint documentation, it looks like there
is no way to say:

if transaction_state ok then
  release X
else
  rollback to X

Which is what a normal COMMIT does (sort of). This is very irritating
for scripting, so maybe a "COMMIT TO X" command would be auseful
addition?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian


Yep, agreed on the random I/O issue.  The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty?  I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once.  And
vacuuming to reclaim 1% ten times seems even more expensive.  The
partial vacuum idea is starting to look like a loser to me again.

---

Gregory Stark wrote:
> "Bruce Momjian" <[EMAIL PROTECTED]> writes:
> 
> > I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> > out how small, and it seems about 15% of the total table, which means if
> > we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> > 80%, assuming 5% of the table is scanned.
> 
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%.
> 
> That is, if you assum that only 5% of the table will be scanned and you
> arrange to do it sequentially then you should expect the i/o to be marginally
> faster than just reading the entire table. Vacuum does do some cpu work and
> wouldn't have to consult the clog as often, so it would still be somewhat
> faster.
> 
> The theory online was that as long as you're reading one page from each disk
> track you're going to pay the same seek overhead as reading the entire track.
> I also had some theories involving linux being confused by the seeks and
> turning off read-ahead but I could never prove them.
> 
> In short, to see big benefits you would have to have a much smaller percentage
> of the table being read. That shouldn't be taken to mean that the DSM is a
> loser. There are plenty of use cases where tables can be extremely large and
> have only very small percentages that are busy. The big advantage of the DSM
> is that it takes the size of the table out of the equation and replaces it
> with the size of the busy portion of the table. So updating a single record in
> a terabyte table has the same costs as updating a single record in a kilobyte
> table.
> 
> Sadly that's not quite true due to indexes, and due to the size of the bitmap
> itself. But going back to your numbers it does mean that if you update a
> single row out of a terabyte table then we'll be removing about 85% of the i/o
> (minus the i/o needed to read the DSM, about .025%). If you update about 1%
> then you would be removing substantially less, and once you get to about 10%
> then you're back where you started.
> 
> 
> -- 
>   Gregory Stark
>   EnterpriseDB  http://www.enterprisedb.com
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Gregory Stark

"Bruce Momjian" <[EMAIL PROTECTED]> writes:

> I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> out how small, and it seems about 15% of the total table, which means if
> we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> 80%, assuming 5% of the table is scanned.

Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.

That is, if you assum that only 5% of the table will be scanned and you
arrange to do it sequentially then you should expect the i/o to be marginally
faster than just reading the entire table. Vacuum does do some cpu work and
wouldn't have to consult the clog as often, so it would still be somewhat
faster.

The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.
I also had some theories involving linux being confused by the seeks and
turning off read-ahead but I could never prove them.

In short, to see big benefits you would have to have a much smaller percentage
of the table being read. That shouldn't be taken to mean that the DSM is a
loser. There are plenty of use cases where tables can be extremely large and
have only very small percentages that are busy. The big advantage of the DSM
is that it takes the size of the table out of the equation and replaces it
with the size of the busy portion of the table. So updating a single record in
a terabyte table has the same costs as updating a single record in a kilobyte
table.

Sadly that's not quite true due to indexes, and due to the size of the bitmap
itself. But going back to your numbers it does mean that if you update a
single row out of a terabyte table then we'll be removing about 85% of the i/o
(minus the i/o needed to read the DSM, about .025%). If you update about 1%
then you would be removing substantially less, and once you get to about 10%
then you're back where you started.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Simon Riggs

On Mon, 2007-01-22 at 12:18 -0500, Bruce Momjian wrote:
> Heikki Linnakangas wrote:
> > 
> > In any case, for the statement "Index cleanup is the most expensive part 
> > of vacuum" to be true, you're indexes would have to take up 2x as much 
> > space as the heap, since the heap is scanned twice. I'm sure there's 
> > databases like that out there, but I don't think it's the common case.
> 
> I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
> out how small, and it seems about 15% of the total table, which means if
> we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> 80%, assuming 5% of the table is scanned.

Clearly keeping track of what needs vacuuming will lead to a more
efficient VACUUM. Your math applies to *any* design that uses some form
of book-keeping to focus in on the hot spots.

On a separate thread, Heikki has raised a different idea for VACUUM.

Heikki's idea asks an important question: where and how should DSM
information be maintained? Up to now everybody has assumed that it would
be maintained when DML took place and that the DSM would be a
transactional data structure (i.e. on-disk). Heikki's idea requires
similar bookkeeping requirements to the original DSM concept, but the
interesting aspect is that the DSM information is collected off-line,
rather than being an overhead on every statement's response time.

That idea seems extremely valuable to me.

One of the main challenges is how we cope with large tables that have a
very fine spray of updates against them. A DSM bitmap won't help with
that situation, regrettably.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian

Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> > Heikki Linnakangas wrote:
> >> Russell Smith wrote:
> >>> 2. Index cleanup is the most expensive part of vacuum.  So doing a 
> >>> partial vacuum actually means more I/O as you have to do index cleanup 
> >>> more often.
> >> I don't think that's usually the case. Index(es) are typically only a 
> >> fraction of the size of the table, and since 8.2 we do index vacuums in 
> >> a single scan in physical order. In fact, in many applications the index 
> >> is be mostly cached and the index scan doesn't generate any I/O at all.
> > 
> > Are _all_ the indexes cached?  I would doubt that.
> 
> Well, depends on your schema, of course. In many applications, yes.
> 
> >  Also, for typical
> > table, what percentage is the size of all indexes combined?
> 
> Well, there's no such thing as a typical table. As an anecdote here's 
> the ratios (total size of all indexes of a table)/(size of corresponding 
> heap) for the bigger tables for a DBT-2 run I have at hand:
> 
> Stock:1190470/68550 = 6%
> Order_line:   950103/274372 = 29%
> Customer: 629011 /(5711+20567) = 8%
> 
> In any case, for the statement "Index cleanup is the most expensive part 
> of vacuum" to be true, you're indexes would have to take up 2x as much 
> space as the heap, since the heap is scanned twice. I'm sure there's 
> databases like that out there, but I don't think it's the common case.

I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure


On 1/22/07, Tom Lane <[EMAIL PROTECTED]> wrote:

I'm getting tired of repeating this, but: neither of you have said
anything that doesn't appear to me to be handled by ON_ERROR_ROLLBACK.
What exactly is lacking in that feature?


* the ability to span the savepoint across multiple statements.
* the ability to get what you want without wastefully creating a
savepoint before every statement.
* losing some behavior which (IMO) is general and beneficial.  how do
psql tricks help proper stored procedures should we aver get them?

That being said, some simple extensions to the psql rollback feature
would get the job done I guess.  I'm still not happy with it but I
knew it was a tough go from the beginning...I appreciate everyone's
comments.

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Simon Riggs

On Mon, 2007-01-22 at 16:11 +, Gregory Stark wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> 
> > BEGIN;
> > stmt1;
> > stmt2; <-- error
> > stmt3;
> > COMMIT;
> >
> > results in stmt3 completing successfully even though stmt1 and stmt2 do
> > not == broken script.
> 
> stmt1 would still be completed successfully.

OK, understood. ON_ERROR_ROLLBACK is what we need, for psql only.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure


On 1/22/07, Martijn van Oosterhout  wrote:

we're talking about psql, maybe all we need to do is create the
commands in psql:

\begin_nest
\commit_nest
\rollback_nest


That would work if we could rollback conditionally on failure (like
on_error_rollback but with definable beginning and ending points).  I
still think we are hacking around limitations of savepoints but it
would solve the scripting problem at least.  A general implementation
on the server would benefit everybody.

merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Simon Riggs

On Mon, 2007-01-22 at 10:49 -0500, Tom Lane wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > ISTM that pg_dump should default to have ANALYZE statements for a table,
> > when both table definition and data are dumped. If only data (-a) or
> > only table definition (-s) is dumped the default should be to *not* add
> > an ANALYZE statement.
> 
> Having pg_dump emit ANALYZE was discussed and rejected years ago.
> Have you read that discussion?  Do you have any new arguments to make?

Well, its been suggested before, but I can't see any good arguments
against. Specifically, there was one person who spoke in favour of it
last time it was mentioned.

http://archives.postgresql.org/pgsql-hackers/2003-02/msg01270.php

If its a performance tip, we should be doing it automatically. If we
genuinely believe that autovacuum will handle it, then we can simply
alter the docs to say: if you aren't running autovacuum, then don't
forget to ANALYZE. IMHO it is not sufficient to rely upon autovacuum to
do all of this work for us.

Rod mentions the discussion has been raised before but doesn't state
what the arguments were:
http://archives.postgresql.org/pgsql-hackers/2003-02/msg01264.php

These other posts also seem to be in favour of the idea...
http://archives.postgresql.org/pgsql-performance/2006-10/msg00142.php
http://archives.postgresql.org/pgsql-hackers/2003-02/msg01273.php

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Gregory Stark

"Simon Riggs" <[EMAIL PROTECTED]> writes:

> BEGIN;
> stmt1;
> stmt2; <-- error
> stmt3;
> COMMIT;
>
> results in stmt3 completing successfully even though stmt1 and stmt2 do
> not == broken script.

stmt1 would still be completed successfully.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas


Bruce Momjian wrote:

Heikki Linnakangas wrote:

Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum.  So doing a 
partial vacuum actually means more I/O as you have to do index cleanup 
more often.
I don't think that's usually the case. Index(es) are typically only a 
fraction of the size of the table, and since 8.2 we do index vacuums in 
a single scan in physical order. In fact, in many applications the index 
is be mostly cached and the index scan doesn't generate any I/O at all.


Are _all_ the indexes cached?  I would doubt that.


Well, depends on your schema, of course. In many applications, yes.


 Also, for typical
table, what percentage is the size of all indexes combined?


Well, there's no such thing as a typical table. As an anecdote here's 
the ratios (total size of all indexes of a table)/(size of corresponding 
heap) for the bigger tables for a DBT-2 run I have at hand:


Stock:  1190470/68550 = 6%
Order_line: 950103/274372 = 29%
Customer:   629011 /(5711+20567) = 8%

In any case, for the statement "Index cleanup is the most expensive part 
of vacuum" to be true, you're indexes would have to take up 2x as much 
space as the heap, since the heap is scanned twice. I'm sure there's 
databases like that out there, but I don't think it's the common case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Simon Riggs

On Mon, 2007-01-22 at 10:46 -0500, Tom Lane wrote:
> "Merlin Moncure" <[EMAIL PROTECTED]> writes:
> > On 1/22/07, Simon Riggs <[EMAIL PROTECTED]> wrote:
> >> Could you post an example, just so we're all clear what the problems
> >> are? I thought I understood what you are requesting; I may not.
> 
> > ok,
> 
> > The short version is I would like the ability to run some sql commands
> > and recover the transaction if an error occurs.
> 
> I'm getting tired of repeating this, but: neither of you have said
> anything that doesn't appear to me to be handled by ON_ERROR_ROLLBACK.
> What exactly is lacking in that feature?

Sorry for not replying to your other post.

ON_ERROR_ROLLBACK doesn't do the same thing, thats why. It shuts out the
noise messages, true, but it doesn't re-execute all of the commands in
the transaction that succeeded and so breaks the transaction, as
originally coded.

BEGIN;
stmt1;
stmt2; <-- error
stmt3;
COMMIT;

results in stmt3 completing successfully even though stmt1 and stmt2 do
not == broken script.

The behaviour we've been discussing is when stmt2 fails, to allow stmt3
to be submitted, so that at commit, stmt1 and stmt3 effects will be
successful *if* the user wishes this.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Gregory Stark


"Simon Riggs" <[EMAIL PROTECTED]> writes:

> There doesn't seem to be any reason to skip the ANALYZE, but I'll
> implement it as an option. 
> -z on | off 
> --analyze=on | off

Only an aesthetic comment: 

Short options don't usually take on/off arguments, I would suggest making the
default be to analyze and make -z and --analyze=off disable the analyze.

You might also consider having a --analyze=verbose and perhaps a
--analyze=full though currently that would require doing vacuum analyze.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Stefan Kaltenbrunner


Simon Riggs wrote:

There is currently a performance tip to run ANALYZE after a pg_dump
script has been restored.

ISTM that pg_dump should default to have ANALYZE statements for a table,
when both table definition and data are dumped. If only data (-a) or
only table definition (-s) is dumped the default should be to *not* add
an ANALYZE statement.

There doesn't seem to be any reason to skip the ANALYZE, but I'll
implement it as an option. 
-z on | off 
--analyze=on | off


This would add a table-specific ANALYZE statement following each table's
actions.

I'm not aware of a strong argument against such an option. Performance
surely can't be one because the time saved on the ANALYZE will quickly
bite back on time lost on poorly planned queries.

What does the panel think?


how is this going to interact with the (now by default enabled) 
autovacuum daemon ?



Stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Tom Lane

"Simon Riggs" <[EMAIL PROTECTED]> writes:
> ISTM that pg_dump should default to have ANALYZE statements for a table,
> when both table definition and data are dumped. If only data (-a) or
> only table definition (-s) is dumped the default should be to *not* add
> an ANALYZE statement.

Having pg_dump emit ANALYZE was discussed and rejected years ago.
Have you read that discussion?  Do you have any new arguments to make?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Martijn van Oosterhout

On Mon, Jan 22, 2007 at 10:40:37AM -0500, Merlin Moncure wrote:
> The short version is I would like the ability to run some sql commands



> any error updating foo or bar will blow up the whole thing.  Maybe
> this is desirable, but it is often nice to be able to do some error
> handling here.  In the pre-savepoint NT implementation I could:



Nested transactions are trivially implemented on top of savepoints. If
we're talking about psql, maybe all we need to do is create the
commands in psql:

\begin_nest
\commit_nest
\rollback_nest

Would that suit your purpose?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Tom Lane

"Merlin Moncure" <[EMAIL PROTECTED]> writes:
> On 1/22/07, Simon Riggs <[EMAIL PROTECTED]> wrote:
>> Could you post an example, just so we're all clear what the problems
>> are? I thought I understood what you are requesting; I may not.

> ok,

> The short version is I would like the ability to run some sql commands
> and recover the transaction if an error occurs.

I'm getting tired of repeating this, but: neither of you have said
anything that doesn't appear to me to be handled by ON_ERROR_ROLLBACK.
What exactly is lacking in that feature?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] pg_dump ANALYZE statements

2007-01-22 Thread Simon Riggs

There is currently a performance tip to run ANALYZE after a pg_dump
script has been restored.

ISTM that pg_dump should default to have ANALYZE statements for a table,
when both table definition and data are dumped. If only data (-a) or
only table definition (-s) is dumped the default should be to *not* add
an ANALYZE statement.

There doesn't seem to be any reason to skip the ANALYZE, but I'll
implement it as an option. 
-z on | off 
--analyze=on | off

This would add a table-specific ANALYZE statement following each table's
actions.

I'm not aware of a strong argument against such an option. Performance
surely can't be one because the time saved on the ANALYZE will quickly
bite back on time lost on poorly planned queries.

What does the panel think?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure


On 1/22/07, Simon Riggs <[EMAIL PROTECTED]> wrote:

Could you post an example, just so we're all clear what the problems
are? I thought I understood what you are requesting; I may not.


ok,

The short version is I would like the ability to run some sql commands
and recover the transaction if an error occurs.

We have the ability to do this with savepoint...rollback to
savepoint...but these are not useful without introducing an external
language (c,perl) that can catch the errors and do a rollback to a
savepoint conditionally on the sql error state.

How would this be useful?
Well when I update production systems I often do this from a master
script that loads smaller scripts from another place:

-- update_production.sql
begin;
\i update_foo.sql
\i update_bar.sql
commit;

any error updating foo or bar will blow up the whole thing.  Maybe
this is desirable, but it is often nice to be able to do some error
handling here.  In the pre-savepoint NT implementation I could:

-- update_production.sql
begin;

begin;
insert into log values ('foo');
\i update_foo.sql
commit;

begin;
insert into log values ('bar');
\i update_bar.sql
commit;

commit;

In between the inner transactions I could check 'log' to see if
everything went through and take appropriate action.  Now client
applications have the luxury of being able to check the return code of
the query execution call, but SQL only scripts can't.

This would be perfectly acceptable:

-- update_production.sql
begin;

savepoint foo;
\i update_foo.sql
rollback to savepoint foo [if I failed only];

savepoint bar;
\i update_bar.sql
rollback to savepoint foo [if I failed only];

commit;

This would be just great for scripts but would also help client side
programming a bit by introducing more flexible error handling
behaviors without having to handle things via the returned sql error
code.  The on errors bit I was talking about earlier is just syntax
sugar but the critical part is being able to recover transactions
partially without external handler...

merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Bruce Momjian

Heikki Linnakangas wrote:
> Russell Smith wrote:
> > 2. Index cleanup is the most expensive part of vacuum.  So doing a 
> > partial vacuum actually means more I/O as you have to do index cleanup 
> > more often.
> 
> I don't think that's usually the case. Index(es) are typically only a 
> fraction of the size of the table, and since 8.2 we do index vacuums in 
> a single scan in physical order. In fact, in many applications the index 
> is be mostly cached and the index scan doesn't generate any I/O at all.

Are _all_ the indexes cached?  I would doubt that.  Also, for typical
table, what percentage is the size of all indexes combined?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Simon Riggs

On Mon, 2007-01-22 at 09:25 -0500, Merlin Moncure wrote:
> On 1/21/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> > "Jaime Casanova" <[EMAIL PROTECTED]> writes:
> > > On 1/21/07, Simon Riggs <[EMAIL PROTECTED]> wrote:
> > >> - continue on error i.e. COMMIT can/might succeed - though there are
> > >> still cases where it cannot, such as a serializable exception.
> >
> > > and what should be the behaviour of that? the same as rollback?
> >
> > The only conceivable implementation is an implicit savepoint issued
> > before each statement.
> 
> I'm not sure I agree here...before the NT implementation was changed
> over to savepoint syntax it was perfectly possible to recover from
> errors inside a  transaction...and is still possible in plpgsql
> functions only.  What I'm asking for is to reopen this behavior
> somehow...in the production environments I've worked in application
> update and maintenance relied heavily on scripting, and lack of this
> functionality forces me to wrap the script launch with C code to work
> around limitations of the savepoint system.

Could you post an example, just so we're all clear what the problems
are? I thought I understood what you are requesting; I may not.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Piggybacking vacuum I/O

2007-01-22 Thread Heikki Linnakangas


I've been looking at the way we do vacuums.

The fundamental performance issue is that a vacuum generates 
nheapblocks+nindexblocks+ndirtyblocks I/Os. Vacuum cost delay helps to 
spread the cost like part payment, but the total is the same. In an I/O 
bound system, the extra I/O directly leads to less throughput.


Therefore, we need to do less I/O. Dead space map helps by allowing us 
to skip blocks that don't need vacuuming, reducing the # of I/Os to 
2*ndirtyblocks+nindexblocks. That's great, but it doesn't help us if the 
dead tuples are spread uniformly.


If we could piggyback the vacuum I/Os to the I/Os that we're doing 
anyway, vacuum wouldn't ideally have to issue any I/O of its own. I've 
tried to figure out a way to do that.


Vacuum is done in 3 phases:

1. Scan heap
2. Vacuum index
3. Vacuum heap

Instead of doing a sequential scan, we could perform the 1st phase by 
watching the buffer pool, scanning blocks for dead tuples when they're 
in memory and keeping track of which pages we've seen. When all pages 
have been seen, the tid list is sorted and 1st phase is done.


In theory, the index vacuum could also be done that way, but let's 
assume for now that indexes would be scanned like they are currently.


The 3rd phase can be performed similarly to the 1st phase. Whenever a 
page enters the buffer pool, we check the tid list and remove any 
matching tuples from the page. When the list is empty, vacuum is complete.


Of course, there's some issues in the design as described above. For 
example, the vacuum might take a long time if there's cold spots in the 
table. In fact, a block full of dead tuples might never be visited again.


A variation of the scheme would be to keep scanning pages that are in 
cache, until the tid list reaches a predefined size, instead of keeping 
track of which pages have already been seen. That would deal better with 
tables with hot and cold spots, but it couldn't advance the relfrozenid 
because there would be no guarantee that all pages are visited. Also, we 
could start 1st phase of the next vacuum, while we're still in the 3rd 
phase of previous one.


Also, after we've seen 95% of the pages or a timeout expires, we could 
fetch the rest of them with random I/O to let the vacuum finish.


I'm not sure how exactly this would be implemented. Perhaps bgwriter or 
autovacuum would do it, or a new background process. Presumably the 
process would need access to relcache.


One issue is that if we're trying to vacuum every table simultaneously 
this way, we'll need more overall memory for the tid lists. I'm hoping 
there's a way to implement this without requiring shared memory for the 
tid lists, that would make the memory management a nightmare. Also, we'd 
need changes to bufmgr API to support this.


This would work nicely with the DSM. The list of pages that need to be 
visited in phase 1 could be initialized from the DSM, largely avoiding 
the problem with cold spots.


Any thoughts before I start experimenting?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/22/07, Csaba Nagy <[EMAIL PROTECTED]> wrote:

On Sat, 2007-01-20 at 18:08, Merlin Moncure wrote:
[snip]
> To be honest, I'm not a huge fan of psql tricks (error recovery being
> another example)  but this could provide a solution.  in your opnion,
> how would you use \if to query the transaction state?

Wouldn't it make sense to introduce instead something like:

\set language plpgsql
... and then redirect to plpgsql all you type ?

That would give you the possibility to execute things in your favorite
language directly from psql without creating a function.

The nature of pl/pgsql would make this impossible, or at least highly
complex and difficult...one reason is that the language has a much
more complex internal state than sql.  Most other languages that I
think this would be worthwhile already their own immediate execution
interpreters.

merlin

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Merlin Moncure

On 1/21/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> On 1/21/07, Simon Riggs <[EMAIL PROTECTED]> wrote:
>> - continue on error i.e. COMMIT can/might succeed - though there are
>> still cases where it cannot, such as a serializable exception.

> and what should be the behaviour of that? the same as rollback?

The only conceivable implementation is an implicit savepoint issued
before each statement.

I'm not sure I agree here...before the NT implementation was changed
over to savepoint syntax it was perfectly possible to recover from
errors inside a  transaction...and is still possible in plpgsql
functions only.  What I'm asking for is to reopen this behavior
somehow...in the production environments I've worked in application
update and maintenance relied heavily on scripting, and lack of this
functionality forces me to wrap the script launch with C code to work
around limitations of the savepoint system.

In pure SQL, we have a 'begin' statement equivalent but no 'end'
statement.  Why not?

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Strange file in snapshot tarball

2007-01-22 Thread Michael Meskes

On Mon, Jan 22, 2007 at 08:00:26PM +0900, ITAGAKI Takahiro wrote:
> There is a file 'compat_informix-dec_test-OpenBSD3.8.broken.stdout'
> under the path 'src/interfaces/ecpg/test/extedted/' in the recent
> ftp snapshot (postgresql-snapshot.tar.gz).
> 
> All of the other files are placed under 'postgresql-snapshot/...'.
> Is this intentional or a mistake?

Would you please care to explain? I do not understand what you are
saying. 

[EMAIL PROTECTED]:~$ tar -ztvf postgresql-snapshot.tar.gz |grep OpenBSD
-rw-r--r-- pgsql/pgsql   32004 2007-01-12 12:31 
postgresql-snapshot/src/interfaces/ecpg/test/expected/compat_informix-dec_test-OpenBSD3.8.broken.stdout
-rw-r--r-- pgsql/pgsql   81524 2007-01-12 12:31 
postgresql-snapshot/src/interfaces/ecpg/test/expected/pgtypeslib-num_test2-OpenBSD3.8.broken.stdout

I don't see the difference here.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas


Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum.  So doing a 
partial vacuum actually means more I/O as you have to do index cleanup 
more often.


I don't think that's usually the case. Index(es) are typically only a 
fraction of the size of the table, and since 8.2 we do index vacuums in 
a single scan in physical order. In fact, in many applications the index 
is be mostly cached and the index scan doesn't generate any I/O at all.


I believe the heap scans are the biggest issue at the moment.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Strange file in snapshot tarball

2007-01-22 Thread ITAGAKI Takahiro

Hello,

There is a file 'compat_informix-dec_test-OpenBSD3.8.broken.stdout'
under the path 'src/interfaces/ecpg/test/extedted/' in the recent
ftp snapshot (postgresql-snapshot.tar.gz).

All of the other files are placed under 'postgresql-snapshot/...'.
Is this intentional or a mistake?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] STOP all user access except for admin for a few minutes?

2007-01-22 Thread Peter Eisentraut

Am Montag, 22. Januar 2007 10:32 schrieb [EMAIL PROTECTED]:
> Is is possible to stop all user access to postgres, but still give access
> to admin?

Make the appropriate adjustments to pg_hba.conf.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] STOP all user access except for admin for a few minutes?

2007-01-22 Thread Russell Smith


[EMAIL PROTECTED] wrote:

Hi there,

Is is possible to stop all user access to postgres, but still give 
access to admin?

Just temporarily, not a security setup.

Something like, stop all users but allow user x and y.
You could restart in single user mode, or alter pg_hba.conf to allow the 
users you want and disallow all other users.


single user mode will require you have direct access to the machine to 
do the alterations.


using pg_hba.conf will not disconnect existing users as far as I'm aware.

That's the best advice I can offer, maybe somebody else will be able to 
give you more


thx

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] STOP all user access except for admin for a few minutes?

2007-01-22 Thread org


Hi there,

Is is possible to stop all user access to postgres, but still give access to 
admin?

Just temporarily, not a security setup.

Something like, stop all users but allow user x and y.

thx 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Csaba Nagy

On Sat, 2007-01-20 at 18:08, Merlin Moncure wrote:
[snip]
> To be honest, I'm not a huge fan of psql tricks (error recovery being
> another example)  but this could provide a solution.  in your opnion,
> how would you use \if to query the transaction state?

Wouldn't it make sense to introduce instead something like:

\set language plpgsql
... and then redirect to plpgsql all you type ?

That would give you the possibility to execute things in your favorite
language directly from psql without creating a function.

Cheers,
Csaba.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Simon Riggs

On Sun, 2007-01-21 at 14:26 -0600, Jim C. Nasby wrote:
> On Sun, Jan 21, 2007 at 11:39:45AM +, Heikki Linnakangas wrote:
> > Russell Smith wrote:
> > >Strange idea that I haven't researched,  Given Vacuum can't be run in a 
> > >transaction, it is possible at a certain point to quit the current 
> > >transaction and start another one.  There has been much chat and now a 
> > >TODO item about allowing multiple vacuums to not starve small tables.  
> > >But if a big table has a long running vacuum the vacuum of the small 
> > >table won't be effective anyway will it?  If vacuum of a big table was 
> > >done in multiple transactions you could reduce the effect of long 
> > >running vacuum.  I'm not sure how this effects the rest of the system 
> > >thought.
> > 
> > That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to 
> > ignore other vacuums in the oldest xmin calculation.
> 
> And IIRC in 8.1 every time vacuum finishes a pass over the indexes it
> will commit and start a new transaction. 

err...It doesn't do this now and IIRC didn't do that in 8.1 either.

> That's still useful even with
> Hannu's patch in case you start a vacuum with maintenance_work_mem too
> small; you can abort the vacuum some time later and at least some of the
> work it's done will get committed.

True, but not recommended, though for a variety of reasons.

The reason is not intermediate commits, but just that the work of VACUUM
is mostly non-transactional in nature, apart from the various catalog
entries when it completes.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] savepoint improvements

2007-01-22 Thread Simon Riggs

On Sun, 2007-01-21 at 13:28 -0500, Tom Lane wrote:
> "Jaime Casanova" <[EMAIL PROTECTED]> writes:
> > On 1/21/07, Simon Riggs <[EMAIL PROTECTED]> wrote:
> >> - continue on error i.e. COMMIT can/might succeed - though there are
> >> still cases where it cannot, such as a serializable exception.
> 
> > and what should be the behaviour of that? the same as rollback?

No. The behaviour is to continue the transaction even though an error
has occurred, i.e.

BEGIN;

1. INSERT...
success

2. INSERT  VALUES () () ()
--fails with error on 3rd VALUES statement

dynamically re-construct INSERT statement with remaining 2 VALUES
statements

3. INSERT VALUES () ();
success

COMMIT;
work done by 1 and 3 is committed

Behaviour needs to support any error at (2) except serializable
exceptions.

> The only conceivable implementation is an implicit savepoint issued
> before each statement.  

Perhaps the only acceptable one.

> By and large that seems to me to be most easily
> handled on the client side, and many of our client libraries already
> have the ability to do it. 

PL/pgSQL supports EXCEPTIONs, but no other clients support it, AFAICS.

>  (For instance, psql has ON_ERROR_ROLLBACK.)

Thats not the same thing, regrettably.

> If we tried to do it on the server side, we would break any client
> software that wasn't prepared for the change of behavior --- see the 7.3
> autocommit fiasco for an example.

Only if we changed the default behaviour, which I am not suggesting.

> So as far as the server is concerned, I see no TODO here.

If the server team won't allow it, we must document that this behaviour
must be a client-side function in the *server* TODO, so that all the
various client projects can read the same TODO item and implement it.

"Implement continue-on-error transactional behaviour for each client
library".

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

85 matches

Mail list logo