Re: [HACKERS] [COMMITTERS] pgsql: Fix PGXS conventions so that extensions can be built against

2007-07-03 Thread Fabien COELHO


Dear Robert,


Fix PGXS conventions so that extensions can be built against Postgres
installations whose pg_config program does not appear first in the PATH.
Per gripe from Eddie Stanley and subsequent discussions with Fabien Coelho
and others.


Is there any chance of this being backpatched?  I just spent a few hours
tracking down a problem with compiling a 3rd party module against an 8.2
installation installed seperatly from my systems packages install. (Ie. i
didnt move it, but there was an additional pg_config on the system pointing
to the wrong/other place).  I'm not exactly sure how pgxs ever worked on
systems with multiple postgres's installed,


Simple : the target pg_config MUST be ahead in the PATH.

--
Fabien.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-03 Thread Gregory Stark
"Pavel Stehule" <[EMAIL PROTECTED]> writes:

> Global temp table can be created from template only when is used. It's
> has not negative efect on app which doesn't use it. The benefit of
> g.t.t. is simplifycation of stored procedures.

And if it's used in 200 txns/s? Imagine the earlier poster who was looking for
a way to display the count of records matching a search followed by the ten
records on the page without re-executing the search.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Zdenek Kotala


I attach In-Place upgrade project concept. Any technical details about 
implementation of each part will be sent later (after concept acceptance).


Please, let me know your comments.

thanks Zdenek
In-place Upgrade project
---

Overview

PostgreSQL community releases new version of PostgreSQL at least once a year 
but the released version is not able to run with old data files. For a user who 
wants to upgrade his database server is a nightmare to upgrade bigger database 
and it is impossible to upgrade a very large database. This project will try to 
solve this issue and provide a functionality for PostgreSQL which will cover 
all user's requirements.


Current status
--
1) Data upgrade is supported only by exporting and importing data using pg_dump
   - slow and causes a long downtime
   - extra disk space needed
   - require both version (old&new)
2) pg_migrator
   - faster then 1) but it still has a long downtime when on disk structure has 
changed
   - previous version of PostgreSQL is needed.
   - no downgrade
   - require both version (old&new)


Project Goals
-
1) Minimize the database downtime.
2) Do not require extra disk space.
3) Do not require an old version of PostgreSQL to be installed.
4) Allow a fallback to original PostgreSQL version if something would go wrong.
5) Support all data types, including user defined types
6) Should be easy to use.


Functional specification

There might be more different approaches on how to perform upgrade. We assume 
that we need to convert a very big database. If we will upgrade the data files 
offline the database downtime might be critical. We will better  try to convert 
data online. This will bring a requirement that new major version of PostgreSQL 
must understand the old data structures and it must be able to work with them.

The next important requirement is an ability to fallback (downgrade) to the 
previous PostgreSQL version. When new major version does not work well for the 
user, he needs to be able to go back to the previous version with minimal 
downtime. This will bring a requirement that PostgreSQL should also be able to 
write data in the old format. However, we might expect some performance penalty.

The above requirements lead to PostgreSQL running in three modes:

compatibility mode (compat) - a new version is running on the old data files. 
It is possible to read and write data as in a previous version. Features of a 
new version might not be available. Fallback to the previous version is 
possible.

upgrade mode - a conversion of data files is running in the background. 
PostgreSQL have to be able to work with both the new data file format and the 
old data file format. Features of a new version might not be available. 
PostgreSQL have to be able to recover and continue in the upgrade mode in case 
of a power/system failure. Fallback to the previous version is not possible.

normal mode - all required data files were upgraded to the new format. Features 
of a new version are available. Fallback to the previous version is not 
possible.

Project Roadmap
-
1) Create a detailed design for each affected part
2) Design a testing environment/process
3) Implementation
4) Change development/release processes - patch author have to be responsible 
for providing the required upgrade operations/functions. Some more development 
related to upgrade might be done also after feature freeze. In-place upgrade 
should be available since x.y.1 version.


List of affected parts
--

1) Data Page Structure
1.1) Page Layout
1.2) Tuple header and structure
1.3) Data encoding - data type changes
2) Control File
3) WAL
4) Catalog
5) Configuration files
7) Others (Procedure language, Communication protocol ...)



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Michael Paesold

Joshua D. Drake wrote:

Alvaro Herrera wrote:

Joshua D. Drake wrote:
Did we change the default autovac parameters for 8.3 (beyond turning 
it on?) because on any reasonably used database, they are way to 
conservative.


We're still on time to change them ...  Any concrete proposals?


I could provide numbers from production high use databases. We could 
probably back those down a little and make more reasonable numbers.


Please do so. Perhaps others can also tell their typical settings.

Best Regards
Michael Paesold

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Heikki Linnakangas

Zdenek Kotala wrote:

Project Goals
-
...
3) Do not require an old version of PostgreSQL to be installed.


Why not? Having two versions installed at the same time doesn't seem 
like a problem to me. You can remove the old version as soon as the 
upgrade is done.



4) Allow a fallback to original PostgreSQL version if something would go wrong.
...
compatibility mode (compat) - a new version is running on the old data files. 
It is possible to read and write data as in a previous version. Features of a 
new version might not be available. Fallback to the previous version is 
possible.


That's madness. Why would you want to do that? The complexity it adds is 
just mind-boggling.


You still wouldn't be able to downgrade after you switch from 
compatibility mode, and it seems likely that any problems, whatever they 
might be, would not appear until you switch. That means you'll need to 
be prepared to downgrade using a backup anyway, so the compatibility 
mode doesn't buy you much.



upgrade mode - a conversion of data files is running in the background. 
PostgreSQL have to be able to work with both the new data file format and the 
old data file format. Features of a new version might not be available. 
PostgreSQL have to be able to recover and continue in the upgrade mode in case 
of a power/system failure. Fallback to the previous version is not possible.


That's feasible, though I don't see why new features wouldn't be available.

As before, upgrade can be done, it's just a matter of someone scratching 
the itch. pg_migrator can handle the catalog changes. Doing the page 
conversion from 8.2 -> 8.3 is possible, and it could be done on-the-fly 
inside PostgreSQL the first time a page is read in.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-03 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> Alvaro Herrera <[EMAIL PROTECTED]> writes:
>> Tom Lane escribió:
>>> I rather doubt that.  The most likely implementation would involve
>>> cloning a "template" entry into pg_class.
>
>> How about a new relkind which causes the table to be located in
>> PGDATA/base//pg_temp_/
>> So each backend can have its own copy of the table with the same
>> relfilenode; there's no need for extra catalog entries.
>
> Uh-huh.  And what do you do with relpages, reltuples, relfrozenxid, and
> pg_statistic entries?  What if one backend wants to TRUNCATE or CLUSTER
> its copy (requiring a new relfilenode)?  Where does ALTER TABLE fit into
> this?

I would have suggested that when we construct the relcache entry for the table
we substitute a local version of refilenode for the global one. 

None of those sound like hard problems. Certainly it's more invasive this way
but the other way is just a hack for complying with the letter of the spec
without actually making it work right. It would be silly and in many use
cases useless to have regular DML operating on data which has no business
being anything but backend-local generate garbage in on-disk catalog tables.

I had a strange thought though. The ideal data structure for local
pg_statistic data in the unlikely case that users analyze their local tables
would in fact be a global temporary table as well. I wonder if we could
bootstrap something similar for pg_class as well.

Incidentally, for what would imagine relfozenxid would be useful for these
tables anyways?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-03 Thread Pavel Stehule

> Global temp table can be created from template only when is used. It's
> has not negative efect on app which doesn't use it. The benefit of
> g.t.t. is simplifycation of stored procedures.

And if it's used in 200 txns/s? Imagine the earlier poster who was looking for
a way to display the count of records matching a search followed by the ten
records on the page without re-executing the search.



I wrote about comparation global temp tables and current temp tables.

Counting of result's records is problem. I know. It's incompleteness
of current cursor's implementation. Every cursor can be materialised
and then can be counted. We need operation OPEN which matarialise
cursor and returns real row_count.

Regards
Pavel

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Gregory Stark

"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:

> As before, upgrade can be done, it's just a matter of someone scratching the
> itch. pg_migrator can handle the catalog changes. Doing the page conversion
> from 8.2 -> 8.3 is possible, and it could be done on-the-fly inside PostgreSQL
> the first time a page is read in.

I was previously thinking a convertor for the packed varlena change wouldn't
be necessary since it handles things just fine when it finds a 4-byte header
where a 1-byte header might have been used. I just realized that's not true.

All varlena headers would have to be shifted two bits to the left (on
little-endian machines) and have their toast bits fiddled even if we don't
bother converting them to the shrink their size. Externally toasted varlenas
would however necessarily change size because they must use the new format.

This is actually a bit of a problem. We would need to know when we read in a
page what the tupledescriptor for that relation looks like to know which
fields are varlena. I'm not sure how easy it would be to arrange for the tuple
descriptor to be passed down that far.

Conceivably we could grab another infomask bit to indicate "uses new-style
varlenas" and then have heaptuple.c understand how to convert them in place.
But that leads to a ton of memory management or page locking problems.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Zdenek Kotala

Heikki Linnakangas wrote:

Zdenek Kotala wrote:

Project Goals
-
...
3) Do not require an old version of PostgreSQL to be installed.


Why not? Having two versions installed at the same time doesn't seem 
like a problem to me. You can remove the old version as soon as the 
upgrade is done.


This should be problem if you want to upgrade operation system together 
with DB and new operation system does not have this version. This 
requirement is based on operation system vendor and also on my 
experience with Oracle upgrade on tru64 cluster when two version of 
installed Oracle does not works good - ok postgresql is better in this 
way :-).



4) Allow a fallback to original PostgreSQL version if something would 
go wrong.

...
compatibility mode (compat) - a new version is running on the old data 
files. It is possible to read and write data as in a previous version. 
Features of a new version might not be available. Fallback to the 
previous version is possible.


That's madness. Why would you want to do that? The complexity it adds is 
just mind-boggling.


I don't think that it increase complexity much. You will have conversion 
function to convert page/data from version x->x+1 and the same way could 
 be use to convert data back. It need more investigation, but I'm not 
afraid about complexity (meanwhile :-).


You still wouldn't be able to downgrade after you switch from 
compatibility mode, and it seems likely that any problems, whatever they 
might be, would not appear until you switch. That means you'll need to 
be prepared to downgrade using a backup anyway, so the compatibility 
mode doesn't buy you much.


One kind of problem is user defined type/function and extension. You can 
test a lot of things on test machines, but some problem could occur on 
big tables/storage. No many postgresql users have multiterrabyte disk 
array for testing.


Second kind of problem is operation system upgrade. If you upgrade OS 
and postgresql together and something fails (not only new version of 
postgresql, but many other things) then admin want to revert back to the 
old  OS version with old postgres version. In Solaris it is called Live 
Upgrade feature.


This feature enable possibility to do it easily. However, backup/restore 
is possible solution - unfortunately not much comfortable. I think it is 
nice to have requirement and if it will be possible we can do that, if 
it will bring a lot of complication, we can remove it.


upgrade mode - a conversion of data files is running in the 
background. PostgreSQL have to be able to work with both the new data 
file format and the old data file format. Features of a new version 
might not be available. PostgreSQL have to be able to recover and 
continue in the upgrade mode in case of a power/system failure. 
Fallback to the previous version is not possible.


That's feasible, though I don't see why new features wouldn't be available.


For example Oracle 10 implemented some new incremental backup feature. 
But you can use it only when you have converted database to correct 
format. I'm not oracle and I'm not able to predict which new feature in 
postgresql will require all updated pages for example. I don't expect 
that database upgrading mode will affect a lot of new features but we 
must have it in the mind.



Zdenek

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Heikki Linnakangas

Gregory Stark wrote:

"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:


As before, upgrade can be done, it's just a matter of someone scratching the
itch. pg_migrator can handle the catalog changes. Doing the page conversion
from 8.2 -> 8.3 is possible, and it could be done on-the-fly inside PostgreSQL
the first time a page is read in.


I was previously thinking a convertor for the packed varlena change wouldn't
be necessary since it handles things just fine when it finds a 4-byte header
where a 1-byte header might have been used. I just realized that's not true.

All varlena headers would have to be shifted two bits to the left (on
little-endian machines) and have their toast bits fiddled even if we don't
bother converting them to the shrink their size. Externally toasted varlenas
would however necessarily change size because they must use the new format.

This is actually a bit of a problem. We would need to know when we read in a
page what the tupledescriptor for that relation looks like to know which
fields are varlena. I'm not sure how easy it would be to arrange for the tuple
descriptor to be passed down that far.


Speaking of on-the-fly upgrading, ReadBuffer is already passed the 
Relation, which contains the TupleDesc, so I don't think that's a 
problem. Not sure how easy that would be to do in an external program 
like pg_migrator.



Conceivably we could grab another infomask bit to indicate "uses new-style
varlenas" and then have heaptuple.c understand how to convert them in place.
But that leads to a ton of memory management or page locking problems.


My thinking is that when a page in the old format is read in, it's 
converted to the new format before doing anything else with it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Zdenek Kotala

Heikki Linnakangas wrote:

Gregory Stark wrote:




Conceivably we could grab another infomask bit to indicate "uses 
new-style
varlenas" and then have heaptuple.c understand how to convert them in 
place.

But that leads to a ton of memory management or page locking problems.


My thinking is that when a page in the old format is read in, it's 
converted to the new format before doing anything else with it.


Yes, I agree with Heikki. Any other place for conversion will generate a 
big complexity of maintanace.


Zdenek



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Martijn van Oosterhout
On Tue, Jul 03, 2007 at 12:05:07PM +0100, Heikki Linnakangas wrote:
> >This is actually a bit of a problem. We would need to know when we
> >read in a page what the tupledescriptor for that relation looks like
> >to know which fields are varlena. I'm not sure how easy it would be
> >to arrange for the tuple descriptor to be passed down that far.
> 
> Speaking of on-the-fly upgrading, ReadBuffer is already passed the 
> Relation, which contains the TupleDesc, so I don't think that's a 
> problem. Not sure how easy that would be to do in an external program 
> like pg_migrator.

My reading of this thread so far is that we're intending to upgrade
pages on the fly? Which means that at any point in time, some pages
will be converted and some not, so the tuple descriptor isn't going to
help, surely you need some identifier on the page telling you if it's
upgraded or not?

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Heikki Linnakangas

Martijn van Oosterhout wrote:

On Tue, Jul 03, 2007 at 12:05:07PM +0100, Heikki Linnakangas wrote:

This is actually a bit of a problem. We would need to know when we
read in a page what the tupledescriptor for that relation looks like
to know which fields are varlena. I'm not sure how easy it would be
to arrange for the tuple descriptor to be passed down that far.
Speaking of on-the-fly upgrading, ReadBuffer is already passed the 
Relation, which contains the TupleDesc, so I don't think that's a 
problem. Not sure how easy that would be to do in an external program 
like pg_migrator.


My reading of this thread so far is that we're intending to upgrade
pages on the fly? Which means that at any point in time, some pages
will be converted and some not, so the tuple descriptor isn't going to
help, surely you need some identifier on the page telling you if it's
upgraded or not?


Yes, we already have such an identifier. The page layout version number 
is stored on every page.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Zdenek Kotala

Martijn van Oosterhout wrote:

On Tue, Jul 03, 2007 at 12:05:07PM +0100, Heikki Linnakangas wrote:

This is actually a bit of a problem. We would need to know when we
read in a page what the tupledescriptor for that relation looks like
to know which fields are varlena. I'm not sure how easy it would be
to arrange for the tuple descriptor to be passed down that far.
Speaking of on-the-fly upgrading, ReadBuffer is already passed the 
Relation, which contains the TupleDesc, so I don't think that's a 
problem. Not sure how easy that would be to do in an external program 
like pg_migrator.


My reading of this thread so far is that we're intending to upgrade
pages on the fly? Which means that at any point in time, some pages
will be converted and some not, so the tuple descriptor isn't going to
help, surely you need some identifier on the page telling you if it's
upgraded or not?


Currently there is page layout version number. But this information is 
not useful for conversion from 8.1 or 8.2, because both version has same 
number, but some datatypes (inet/cidr) have different storage format.


Small discussion about page version is there

http://archives.postgresql.org/pgsql-hackers/2007-06/msg00745.php

The page identification is one thing what must been improved.


Zdenek

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Postgresql.conf cleanup

2007-07-03 Thread Peter Eisentraut
Am Montag, 2. Juli 2007 13:03 schrieb Josh Berkus:
> (change requires restart): this phrase appears over 20 times in the
> notes.  This is enough times to be really repetitive and take up a lot
> of scrolling space, while not actually covering all startup-time
> parameters.

Which ones are missing?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-03 Thread Bruce Momjian
Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Tom Lane escribi?:
> >> I rather doubt that.  The most likely implementation would involve
> >> cloning a "template" entry into pg_class.
> 
> > How about a new relkind which causes the table to be located in
> > PGDATA/base//pg_temp_/
> > So each backend can have its own copy of the table with the same
> > relfilenode; there's no need for extra catalog entries.
> 
> Uh-huh.  And what do you do with relpages, reltuples, relfrozenxid, and
> pg_statistic entries?  What if one backend wants to TRUNCATE or CLUSTER
> its copy (requiring a new relfilenode)?  Where does ALTER TABLE fit into
> this?

And what is the use-case for this functionality?  What does it give us
that we don't already have?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] todo: Hash index creation

2007-07-03 Thread Naz Gassiep




Wow... not sure how I missed that. I *did* create this schema ages ago,
perhaps it wasn't there, or at the time I had no idea what the
implications were. *shrug*
Regards,
- Naz.


Tom Lane wrote:

  Naz Gassiep <[EMAIL PROTECTED]> writes:
  
  
As a result, when creating tables containing large blocks of text I wish
to index, I've been using HASH as an index method. Please can we state
in the manual that HASH index types are in a beta stage of development
or something similar, or perhaps remove the manual entry altogether
until HASH is at a point where it is usable in production.

  
  
Uh, the manual already does say

Note: Testing has shown PostgreSQL's hash indexes to perform no better
than B-tree indexes, and the index size and build time for hash indexes
is much worse. Furthermore, hash index operations are not presently
WAL-logged, so hash indexes might need to be rebuilt with REINDEX after
a database crash. For these reasons, hash index use is presently
discouraged.

under 11.2 Index Types, as well as various derogatory remarks elsewhere.

			regards, tom lane

  





[HACKERS] how to "pg_dump", based in select command

2007-07-03 Thread Jeferson Kasper

Hello folks.
I want to know how to dump some parts of a database?
I need to extract the records in a "select * from table" and the pg_dump or
other tool will create a file with the records found in this "select
command", and after this, i will restore this file in another database with
the same structure.
Any idea?
Thanks people.

Jeferson Kasper


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Zdenek Kotala

Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
My thinking is that when a page in the old format is read in, it's 
converted to the new format before doing anything else with it.


Yeah, I'm with Heikki on this.  What I see as a sane project definition
is:

* pg_migrator or equivalent to convert the system catalogs
* a hook in ReadBuffer to allow a data page conversion procedure to
  be applied, on the basis of checking for old page layout version.



pg_migrator is separate tool which requires old postgres version and I 
would like to have solution in postgres binary without old version 
presence. Very often new postgres version is store in same location 
(e.g. /usr/bin) and normal users could have a problem.


I see there three possible solution:

1) special postgres startup mode - postgres --upgrade-catalog
2) automatic conversion - when postgres convert catalog automatically on 
first startup on old db cluster
3) (in compat mode) catalog will be converted on fly (read/write), until 
upgrade mode is not start


> I think insisting on a downgrade option is an absolutely certain way
> of guaranteeing that the project will fail.

How I mentioned before. This is nice to have requirement. I would like 
to have in the mind and when it starts complexity explosion we can 
remove it from the requirement list.



I'm not sure it's feasible to expect that we can change representations
of user-defined types, either.  I don't see how you would do that
without catalog access (to look up the UDT), and the page conversion
procedure is going to have to be able to operate without catalog
accesses.  (Thought experiment: a page is read in during crash recovery
or PITR slave operation, and discovered to have the old format.)


The idea how to solve problem in data type on disk representation change 
is to keep old and new datatype in/out function. New created tables will 
contains new type implementation and old tables could be converted with 
ALTER TABLE command on user request. Old data type could be store in 
compat library.



BTW, I thought of a likely upgrade problem that we haven't discussed
(AFAIR) in any of the many threads on this subject.  What about an index
access method change that involves an index-wide restructuring, such
that it can't be done one page at a time?  A plausible example is
changing hash indexes to have multiple buckets per page.  Presumably
you can fix the index with REINDEX, but that doesn't meet the goal of
limited downtime, if the index is big.  Is there another way?



Yes, there is way to keep old and new implementation of index and each 
will have different oid. Primary key for pg_am table will be 
name+pg_version - It is similar to UDT solution. CREATE INDEX as a 
REINDEX will use actual implementation.



Zdenek

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] how to "pg_dump", based in select command

2007-07-03 Thread Josh Tolley

On 7/3/07, Jeferson Kasper <[EMAIL PROTECTED]> wrote:

Hello folks.
I want to know how to dump some parts of a database?
I need to extract the records in a "select * from table" and the pg_dump or
other tool will create a file with the records found in this "select
command", and after this, i will restore this file in another database with
the same structure.
Any idea?
Thanks people.

Jeferson Kasper



In 8.2 you can use the COPY command with a query, as in COPY (SELECT
...) TO 'filename'

In earlier versions, one alternative is to create a table filled with
the results of your query and pg_dump it. Alternatively you might also
pipe the query into psql and pipe the output to a file, use psql's \o
option, etc.

-Josh

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Martijn van Oosterhout
On Tue, Jul 03, 2007 at 11:36:03AM -0400, Tom Lane wrote:
> I'm not sure it's feasible to expect that we can change representations
> of user-defined types, either.  I don't see how you would do that
> without catalog access (to look up the UDT), and the page conversion
> procedure is going to have to be able to operate without catalog
> accesses.  (Thought experiment: a page is read in during crash recovery
> or PITR slave operation, and discovered to have the old format.)

Well, there are two types of conversions:
1. Simple byte rearrangement. If it's not too many you could simply
build them into pg_migrator. Doesn't help with user-defined types, but
maybe you allow plugins to define a seperate hook whose only purpose is
to upgrade the value (without catalog access...). 

2. Otherwise you could do a VACUUM over the table to touch every page,
thus solving it. Dunno what to do about crashing at this point.

Hmm, actually, what's the problem with PITR restoring a page in the old
format. As long as it's clear it's the old format it'll get fixed when
the page is actually used.

> BTW, I thought of a likely upgrade problem that we haven't discussed
> (AFAIR) in any of the many threads on this subject.  What about an index
> access method change that involves an index-wide restructuring, such
> that it can't be done one page at a time?  A plausible example is
> changing hash indexes to have multiple buckets per page.  Presumably
> you can fix the index with REINDEX, but that doesn't meet the goal of
> limited downtime, if the index is big.  Is there another way?

Well, we have concurrent index builds these days. I certainly don't
have ideas on how to fix this, especially if the index is on a datatype
that has changed format... I suppose those indexes will just have to be
rebuilt (a REINDEX will upgrade every page in the table anyway...). I
think it'd still be cheaper than dump/restore.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Martijn van Oosterhout
On Tue, Jul 03, 2007 at 07:13:29PM +0200, Zdenek Kotala wrote:
> pg_migrator is separate tool which requires old postgres version and I 
> would like to have solution in postgres binary without old version 
> presence. Very often new postgres version is store in same location 
> (e.g. /usr/bin) and normal users could have a problem.

Possibly. But you have to demonstrate it actually works and it's best
to do that with a seperate process. Or fix pg_migrator to not require
the old version, either way will do.

> > I think insisting on a downgrade option is an absolutely certain way
> > of guaranteeing that the project will fail.
> 
> How I mentioned before. This is nice to have requirement. I would like 
> to have in the mind and when it starts complexity explosion we can 
> remove it from the requirement list.

It seems to me that if every step of the process is WAL logged, then
downgrading is simply a matter of restoring and using PITR. Downgrades
are hard work, primarily because in the long run it's going to be
totally untested code because hardly anyone is going to need it.

> Yes, there is way to keep old and new implementation of index and each 
> will have different oid. Primary key for pg_am table will be 
> name+pg_version - It is similar to UDT solution. CREATE INDEX as a 
> REINDEX will use actual implementation.

Bad idea. Indexes are already complicated peices of code, there's no
guarentee the old code will even work with the new version. Seems like
too much work considering REINDEX will simply fix the problem outright.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Yeah, I'm with Heikki on this.  What I see as a sane project definition
>> is:
>> 
>> * pg_migrator or equivalent to convert the system catalogs
>> * a hook in ReadBuffer to allow a data page conversion procedure to
>> be applied, on the basis of checking for old page layout version.

> pg_migrator is separate tool which requires old postgres version and I 
> would like to have solution in postgres binary without old version 
> presence. Very often new postgres version is store in same location 
> (e.g. /usr/bin) and normal users could have a problem.

Again, you are setting yourself up for complete failure if you insist
on having every possible nicety in the first version.  An incremental
approach is far more likely to succeed than a "big bang".

I don't see a strong need to have a solution in-the-binary at all.
I would envision that packagers of, say, 8.4 would include a minimal
8.3 build under an old/ subdirectory, and pg_migrator or a similar
tool could invoke the old postmaster from there to do the catalog
dumping.  (In an RPM or similar environment, the user could even
"rpm -e postgresql-upgrade" to get rid of the deadwood after completing
the upgrade, whereas with an integrated binary you're stuck carrying
around a lot of one-time-use code.)

This strikes me as approximately a thousand percent more maintainable
than trying to have a single set of code coping with multiple catalog
representations.  Also it scales easily to supporting more than one back
version, whereas doing the same inside one binary will not scale at all.

Keep in mind that if your proposal involves any serious limitation on
the developers' freedom to refactor internal backend APIs or change
catalog representations around, it *will be rejected*.  Do not have any
illusions on that point.  It'll be a tough enough sell freezing on-disk
representations for user data.  Demanding the internal ability to read
old catalog versions would be a large and ongoing drag on development;
I do not think we'll hold still for it.  (To point out just one of many
problems, it'd largely destroy the C-struct-overlay technique for
reading catalogs.)

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Zdenek Kotala

Martijn van Oosterhout wrote:

On Tue, Jul 03, 2007 at 07:13:29PM +0200, Zdenek Kotala wrote:
pg_migrator is separate tool which requires old postgres version and I 
would like to have solution in postgres binary without old version 
presence. Very often new postgres version is store in same location 
(e.g. /usr/bin) and normal users could have a problem.


Possibly. But you have to demonstrate it actually works and it's best
to do that with a seperate process. Or fix pg_migrator to not require
the old version, either way will do.


Pg_migrator use pg_dump for dump catalog and main concept is based on 
presence of old postgres version. I think Fix it means rewrite it.



I think insisting on a downgrade option is an absolutely certain way
of guaranteeing that the project will fail.
How I mentioned before. This is nice to have requirement. I would like 
to have in the mind and when it starts complexity explosion we can 
remove it from the requirement list.


It seems to me that if every step of the process is WAL logged, then
downgrading is simply a matter of restoring and using PITR. Downgrades
are hard work, primarily because in the long run it's going to be
totally untested code because hardly anyone is going to need it.


It is not downgrade. It is about keep old structure until user says 
convert to the new data structure.


Yes, there is way to keep old and new implementation of index and each 
will have different oid. Primary key for pg_am table will be 
name+pg_version - It is similar to UDT solution. CREATE INDEX as a 
REINDEX will use actual implementation.


Bad idea. Indexes are already complicated peices of code, there's no
guarentee the old code will even work with the new version. Seems like
too much work considering REINDEX will simply fix the problem outright.


I do not expect that old code will work with new index structure. I want 
to keep both implementation and old index will be processed by old code 
and new one will be processed by new implementation. Each will have 
different OID and pg_class.relam will point to correct implementation. 
These obsolete index method could be located in separate compat lib.


Zdenek

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> It is not downgrade. It is about keep old structure until user says 
> convert to the new data structure.

As Martijn already pointed out, the odds of problems surfacing only
after that conversion starts seem high enough to render the whole idea
a bit pointless.  IMHO it's not worth the enormous development costs
it will add ... and it's *certainly* unwise to tie success of the
entire in-place-upgrade project to that one feature.

The attractive point about pg_migrator plus page-at-a-time data upgrade
is that it'd solve 90% of the problem with 10% of the work.  If you get
that going, and people get accustomed to working with the development
restrictions associated with data upgradability, then you might be able
to come back and make a case for catalog upgradability and/or
downgradability in some future version.  But right now you're asking
people to do 90% of the work before having anything at all.

> I do not expect that old code will work with new index structure. I want 
> to keep both implementation and old index will be processed by old code 
> and new one will be processed by new implementation. Each will have 
> different OID and pg_class.relam will point to correct implementation. 

I don't think it's quite that easy when you consider user-defined
datatypes.  Where are you going to get two sets of opclasses from?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> My thinking is that when a page in the old format is read in, it's 
> converted to the new format before doing anything else with it.

Yeah, I'm with Heikki on this.  What I see as a sane project definition
is:

* pg_migrator or equivalent to convert the system catalogs
* a hook in ReadBuffer to allow a data page conversion procedure to
  be applied, on the basis of checking for old page layout version.

I think insisting on a downgrade option is an absolutely certain way
of guaranteeing that the project will fail.

I'm not sure it's feasible to expect that we can change representations
of user-defined types, either.  I don't see how you would do that
without catalog access (to look up the UDT), and the page conversion
procedure is going to have to be able to operate without catalog
accesses.  (Thought experiment: a page is read in during crash recovery
or PITR slave operation, and discovered to have the old format.)


BTW, I thought of a likely upgrade problem that we haven't discussed
(AFAIR) in any of the many threads on this subject.  What about an index
access method change that involves an index-wide restructuring, such
that it can't be done one page at a time?  A plausible example is
changing hash indexes to have multiple buckets per page.  Presumably
you can fix the index with REINDEX, but that doesn't meet the goal of
limited downtime, if the index is big.  Is there another way?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Zdenek Kotala

Tom Lane wrote:



Again, you are setting yourself up for complete failure if you insist
on having every possible nicety in the first version.  An incremental
approach is far more likely to succeed than a "big bang".


Yes, I know. I don't want to solve everything in one patch. I just 
looking forward and I'm trying to summarize complexity of problem. I 
will not want to stay on dead road and say ups after couple of months of 
development ...


Zdenek

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Richard Huxton

Tom Lane wrote:

Keep in mind that if your proposal involves any serious limitation on
the developers' freedom to refactor internal backend APIs or change
catalog representations around, it *will be rejected*.  Do not have any
illusions on that point.  It'll be a tough enough sell freezing on-disk
representations for user data.  Demanding the internal ability to read
old catalog versions would be a large and ongoing drag on development;
I do not think we'll hold still for it.  (To point out just one of many
problems, it'd largely destroy the C-struct-overlay technique for
reading catalogs.)


One thing no-one's mentioned is how we're going to deal with definitive 
incompatibilities.


Examples:
- Tightening of UTF8 code. Means some text from old version won't transfer.
- Changing behaviour of greatest() - recently discussed. Might 
invalidate views/application queries.


It's the second example that I can see biting, the UTF stuff is big 
enough that it'll be noticed. It'd be all too easy to have a change in 
some inet-addr function that you don't notice your app is using. I can't 
think of any way of definitively auditing what features are in use (or 
have changed between versions).


Or are these examples of changes that will only be allowed e.g. every 
other major version.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Tom Lane
Martijn van Oosterhout <[EMAIL PROTECTED]> writes:
> On Tue, Jul 03, 2007 at 11:36:03AM -0400, Tom Lane wrote:
>> ... (Thought experiment: a page is read in during crash recovery
>> or PITR slave operation, and discovered to have the old format.)

> Hmm, actually, what's the problem with PITR restoring a page in the old
> format. As long as it's clear it's the old format it'll get fixed when
> the page is actually used.

Well, what I'm concerned about is something like a WAL record providing
a new-format tuple to be inserted into a page, and then you find that
the page contains old-format tuples.

[ thinks some more... ]  Actually, so long as we are willing to posit that

1. You're only allowed to upgrade a DB that's been cleanly shut down
(no replay of old-format WAL logs allowed)

2. Page format conversion is WAL-logged as a complete page replacement

then AFAICS WAL-reading operations should never have to apply any
updates to an old-format page; the first touch of any old page in the
WAL sequence should be a page replacement that updates it to new format.
This is not different from the argument why full_page_writes ensures
recovery from write failures.

So in principle the page-conversion stuff should always operate in a
live transaction.  (Which is good, because now that I think about it
we couldn't emit a WAL record for the page conversion in those other
contexts.)  I still feel pretty twitchy about letting it do catalog
access, though, because it has to operate at such a low level of the
system.  bufmgr.c has no business invoking anything that might do
catalog access.  If nothing else there are deadlock issues.

On the whole I think we could define format conversions for user-defined
types as "not our problem".  A new version of a UDT that has an
incompatible representation on disk can simply be treated as a new type
with a different OID, exactly as Zdenek was suggesting for index AMs.
To upgrade a database containing such a column, you install
"my_udt_old.so" that services the old representation, ALTER TYPE my_udt
RENAME TO my_udt_old, then install new type my_udt and start using that.
Anyway that seems good enough for version 1.0 --- I don't recall that
we've ever changed the on-disk representation of any contrib/ types,
so how important is this scenario in the real world?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] ACM Paper relevant to our buffer algorithm

2007-07-03 Thread Gregory Stark

Incidentally I found this paper in ACM SIGMETRICS 1992 covering more or less
precisely the same algorithm we're using for our clock sweep. I haven't quite
digested it yet myself so I'm not sure what the conclusions about weights tell
us to do with our buffer usage counter.

I put a copy up for download since even though it's permitted to copy I don't
know how to find a public link. It's kind of big so please download it and
read it locally, don't try to read it from my machine:

http://stark.xeocode.com/~stark/p35-nicola.pdf.gz

Regarding copyright the paper sez:

Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the ACM copyright notice and the
title of the publication and its date appear, and notice is given
that copying is by permission of ths Association for Computing
Machinery. To copy otherwise, or to republish, requires a fee
and/or specific permission.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ACM Paper relevant to our buffer algorithm

2007-07-03 Thread Greg Smith
Here are some more recent papers that also give good insight into research 
in this area:


http://www.cs.usask.ca/~wew036/comprehensive.pdf
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Jim C. Nasby
On Tue, Jul 03, 2007 at 11:31:08AM +0200, Michael Paesold wrote:
> Joshua D. Drake wrote:
> >Alvaro Herrera wrote:
> >>Joshua D. Drake wrote:
> >>>Did we change the default autovac parameters for 8.3 (beyond turning 
> >>>it on?) because on any reasonably used database, they are way to 
> >>>conservative.
> >>
> >>We're still on time to change them ...  Any concrete proposals?
> >
> >I could provide numbers from production high use databases. We could 
> >probably back those down a little and make more reasonable numbers.
> 
> Please do so. Perhaps others can also tell their typical settings.

FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpLfM5OHZFng.pgp
Description: PGP signature


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Jim C. Nasby
On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
> 
> Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
> text now.
> 
> Is there a reason to say anything beyond "use autovac"?

There is; I know that things like web session tables aren't handled very
well by autovacuum if there are any moderately large tables (anything
that will take more than a few minutes to vacuum). Eventually we should
be able to accommodate that case with multiple workers, but we'll need a
mechanism to ensure that at least one worker doesn't get tied up in
large vacuums.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpxNRAaq8BOV.pgp
Description: PGP signature


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
>> Is there a reason to say anything beyond "use autovac"?

> There is; I know that things like web session tables aren't handled very
> well by autovacuum if there are any moderately large tables (anything
> that will take more than a few minutes to vacuum). Eventually we should
> be able to accommodate that case with multiple workers, but we'll need a
> mechanism to ensure that at least one worker doesn't get tied up in
> large vacuums.

And which part of that do you think isn't resolved in 8.3?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Alvaro Herrera
Jim C. Nasby wrote:
> On Tue, Jul 03, 2007 at 11:31:08AM +0200, Michael Paesold wrote:
> > Joshua D. Drake wrote:
> > >Alvaro Herrera wrote:
> > >>Joshua D. Drake wrote:
> > >>>Did we change the default autovac parameters for 8.3 (beyond turning 
> > >>>it on?) because on any reasonably used database, they are way to 
> > >>>conservative.
> > >>
> > >>We're still on time to change them ...  Any concrete proposals?
> > >
> > >I could provide numbers from production high use databases. We could 
> > >probably back those down a little and make more reasonable numbers.
> > 
> > Please do so. Perhaps others can also tell their typical settings.
> 
> FWIW, I normally go with the 8.2 defaults, though I could see dropping
> vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
> could be decreased further, maybe divide by 10.

How about pushing thresholds all the way down to 0?

-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Vivir y dejar de vivir son soluciones imaginarias.
La existencia está en otra parte" (Andre Breton)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-03 Thread Jim C. Nasby
On Tue, Jul 03, 2007 at 11:49:05AM -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > > Tom Lane escribi?:
> > >> I rather doubt that.  The most likely implementation would involve
> > >> cloning a "template" entry into pg_class.
> > 
> > > How about a new relkind which causes the table to be located in
> > > PGDATA/base//pg_temp_/
> > > So each backend can have its own copy of the table with the same
> > > relfilenode; there's no need for extra catalog entries.
> > 
> > Uh-huh.  And what do you do with relpages, reltuples, relfrozenxid, and
> > pg_statistic entries?  What if one backend wants to TRUNCATE or CLUSTER
> > its copy (requiring a new relfilenode)?  Where does ALTER TABLE fit into
> > this?
> 
> And what is the use-case for this functionality?  What does it give us
> that we don't already have?

The use case is any system that uses temp tables in an OLTP setting,
which certainly isn't uncommon. The problem is that today (and as well
with a global temp table that is still writing to the catalogs) is that
every OLTP operation that creates or drops a temp table is doing DDL.
At best, that leads to a lot of catalog bloat. Right now, it appears to
also expose some race conditions (we've got a customer that's been bit
by this and we've been able to reproduce some odd behavior in the lab).
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp6y7cHzcrFd.pgp
Description: PGP signature


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Kevin Grittner
>>> On Tue, Jul 3, 2007 at  3:36 PM, in message <[EMAIL PROTECTED]>,
Tom Lane <[EMAIL PROTECTED]> wrote: 
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
>> On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
>>> Is there a reason to say anything beyond "use autovac"?
> 
>> There is; I know that things like web session tables aren't handled very
>> well by autovacuum if there are any moderately large tables (anything
>> that will take more than a few minutes to vacuum). Eventually we should
>> be able to accommodate that case with multiple workers, but we'll need a
>> mechanism to ensure that at least one worker doesn't get tied up in
>> large vacuums.
> 
> And which part of that do you think isn't resolved in 8.3?
 
We have a 406GB table where 304GB is in one table.  The next two tables
are 57GB and 40GB.  Inserts to these three tables are constant during the
business day, along with inserts, updates, and very few deletes to the
other tables.  Database modifications are few and scattered at night and
on weekends.  Virtually all queries are during the business day.  The
large tables are "insert only" except for a weekend delete of the oldest
one week of data, to keep a rolling set of just over a year.  (No, we
really don't want to go to weekly partitions, if it can be avoided.)
 
Autovacuum is enabled with very aggressive settings, to cover small
tables, including one with about 75 rows that can be updated 100 or more
times per second.  Even with these settings there is zero chance of any
table of even moderate size hitting the autovacuum threshold between our
scheduled vacuums.  When we tried doing a nightly vacuum analyze starting
at the end of business day, it ran well into the next day, and the users
complained of slowness until it stopped.  We changed to a weeknight vacuum
analyze of the volatile tables which aren't in the big three, and a vacuum
analyze of the entire database right after the weekly delete.
 
Isn't this a use case where we don't want to count on autovacuum, both
from a table bloat perspective and the user impact perspective, even under
8.3?
 
In terms of our autovacuum settings, we have several different types of
databases, and in all of them we seem to do well with these changes from
the 8.2 defaults, combined with (except for the above configuration) a
nightly database vacuum:
 
autovacuum_naptime = 10s
autovacuum_vacuum_threshold = 1
autovacuum_analyze_threshold = 1
 
Oh, the tiny, high-update tables occasionally bloat to hundreds or
thousands of pages because of long-running transactions, so we schedule
a daily cluster on those, just to keep things tidy.
 
-Kevin
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> 2. Page format conversion is WAL-logged as a complete page replacement

It seems we get that for free. By definition any modification to a page after
conversion will be the first record for that page since a checkpoint. It just
means we have to force full_page_writes to be true, at least for a buffer
that's been converted if not whenever we're processing a mixed format
database.

> On the whole I think we could define format conversions for user-defined
> types as "not our problem".  

Hm, perhaps we could do it by doing as much of the work up-front as possible.
Requiring datatypes to define a function in pg_proc to do the conversion and
require that it be a function which can stand on its own without catalog
lookups or other facilities. Then when we start up on a mixed format database
we load all those functions into a hash by typeoid and provide that hash to
bufmgr. By definition any datatypes we expect to find in the database must
already be in the catalog before we start.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Kevin Grittner
>>> On Tue, Jul 3, 2007 at  5:17 PM, in message
<[EMAIL PROTECTED]>, "Kevin Grittner"
<[EMAIL PROTECTED]> wrote: 
>  
> We have a 406GB table where 304GB is in one table.  The next two tables

It's probably obvious, but I meant a 406GB database.  Sorry.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Alvaro Herrera
Kevin Grittner wrote:

> We have a 406GB table where 304GB is in one table.  The next two tables
> are 57GB and 40GB.  Inserts to these three tables are constant during the
> business day, along with inserts, updates, and very few deletes to the
> other tables.  Database modifications are few and scattered at night and
> on weekends.  Virtually all queries are during the business day.  The
> large tables are "insert only" except for a weekend delete of the oldest
> one week of data, to keep a rolling set of just over a year.  (No, we
> really don't want to go to weekly partitions, if it can be avoided.)
>  
> Autovacuum is enabled with very aggressive settings, to cover small
> tables, including one with about 75 rows that can be updated 100 or more
> times per second.  Even with these settings there is zero chance of any
> table of even moderate size hitting the autovacuum threshold between our
> scheduled vacuums.  When we tried doing a nightly vacuum analyze starting
> at the end of business day, it ran well into the next day, and the users
> complained of slowness until it stopped.  We changed to a weeknight vacuum
> analyze of the volatile tables which aren't in the big three, and a vacuum
> analyze of the entire database right after the weekly delete.

Sounds like you would be served by setting those specific tables to a
lower vacuum scale factor (keeping a more normal default for the rest of
the tables), and having a non-zero vacuum delay setting (to avoid
excessive I/O consumption).  Have you tried that?

The problem you would still have with 8.2 is that while one of these
tables is being vacuumed the rest won't be vacuumed at all.  In 8.3 the
other tables can still be vacuumed regularly with the big vacuum still
running (a feature I dubbed "multiple workers", but we're still waiting
to know what name the marketing guys are gonna use).

> In terms of our autovacuum settings, we have several different types of
> databases, and in all of them we seem to do well with these changes from
> the 8.2 defaults, combined with (except for the above configuration) a
> nightly database vacuum:
>  
> autovacuum_naptime = 10s

Another change in 8.3 is that the naptime is per-database, i.e. the time
between two consecutive autovac runs on a database.  So with a setting
of 10s, if you have 10 database there will be one autovac run per
second, whereas on 8.2 there would be one autovac each 10 seconds
(unless you run out of worker slots).


> Oh, the tiny, high-update tables occasionally bloat to hundreds or
> thousands of pages because of long-running transactions, so we schedule
> a daily cluster on those, just to keep things tidy.

If you can afford the cluster then there's no problem.  I don't expect
that to change in 8.3.

-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are."  -- Charles J. Sykes' advice to teenagers

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
>> 2. Page format conversion is WAL-logged as a complete page replacement

> It seems we get that for free. By definition any modification to a page after
> conversion will be the first record for that page since a checkpoint.

No, I don't think so.  Consider a page that's been sucked in for just a
SELECT.  Or did you plan to pay the page conversion cost over and over
again until it's finally modified?  Anyway the whole idea fails if the
first mod that needs to be made is just a hint-bit update.

> Hm, perhaps we could do it by doing as much of the work up-front as possible.
> Requiring datatypes to define a function in pg_proc to do the conversion and
> require that it be a function which can stand on its own without catalog
> lookups or other facilities.

I don't think you quite got the point of "no catalog lookups".  That
includes not looking up pg_type or pg_proc entries.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Tom Lane
Richard Huxton <[EMAIL PROTECTED]> writes:
> One thing no-one's mentioned is how we're going to deal with definitive 
> incompatibilities.

I don't really think that in-place update changes that story at all.
The advice has always been "read the release notes and test your
applications before updating".  The only thing that in-place update
will offer is shorter update downtime than dump/reload.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
>> "Tom Lane" <[EMAIL PROTECTED]> writes:
>>> 2. Page format conversion is WAL-logged as a complete page replacement
>
>> It seems we get that for free. By definition any modification to a page after
>> conversion will be the first record for that page since a checkpoint.
>
> No, I don't think so.  Consider a page that's been sucked in for just a
> SELECT.  Or did you plan to pay the page conversion cost over and over
> again until it's finally modified?  Anyway the whole idea fails if the
> first mod that needs to be made is just a hint-bit update.

Ah, true, it would have to dirty the page and that means putting an LSN in and
that means needing a WAL record to point it to. But the actual record is
almost irrelevant as any record will cause the full page to be attached.

>> Hm, perhaps we could do it by doing as much of the work up-front as possible.
>> Requiring datatypes to define a function in pg_proc to do the conversion and
>> require that it be a function which can stand on its own without catalog
>> lookups or other facilities.
>
> I don't think you quite got the point of "no catalog lookups".  That
> includes not looking up pg_type or pg_proc entries.

ISTM that it's really just not looking up pg_type or pg_proc entries *from
within the low level storage code* that's evil. If they're all sucked in on
backend startup and restricted by policy to not be able to do anything fancy
themselves then I don't see your objection.

I do have an objection though: TOAST data. Even were the bufmgr to be provided
with a mechanism to find a data type conversion function I don't see what it
would do about a toasted datum. Obviously bufmgr can't fiddle with it, and if
it leaves it for pg_detoast_datum then even if there were some way for
pg_detoast_datum to know whether it's an old format datum or a new format one
there's no way for it replace the TOAST datum on disk with the new one.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposal: In-Place upgrade concept

2007-07-03 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> I do have an objection though: TOAST data. Even were the bufmgr to be
> provided with a mechanism to find a data type conversion function I
> don't see what it would do about a toasted datum.

Urgh, that *is* a nasty thought :-(

Out-of-line datums aren't the only issue, either: consider inline
compressed datums.  A data representation change, even one that is known
not to increase the ordinary uncompressed size of the datum, could
easily render it slightly less compressible, resulting in a risk that
the tuple doesn't fit on the page anymore.  It hardly seems practical
(maybe not even possible) to guarantee that this cannot happen.

So maybe we are up against the conclusion that in-place updates cannot
support datatype representation changes, at least not for toastable
datatypes.  We could still handle 'em by the expedient suggested
upthread for user-defined types, ie the "new representation" is treated
as a whole new type.  That's not terribly appetizing though; I had
expected we could be more efficient for the case of changes in built-in
types.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Why so many out-of-disk-space failures on buildfarm machines?

2007-07-03 Thread Tom Lane
It seems like we see a remarkable number of occurrences of $subject.
For instance, right now we have these members failing on various
branches:

echidna No space left on device
asp No space left on device
herring No space left on device (icc seems particularly unable
to cope with this, or at least I suspect that's the
reason for some builds failing with that bizarre message)
kitegcc quoth "Internal compiler error: Segmentation fault"
wildebeest  long-standing configuration error (no Tk installed)
wombat  long-standing configuration error (no Tk installed)

I realize that a lot of these members are running on old underpowered
machines with not so much disk, but is it possible that the buildfarm
itself is leaking disk space?  Not cleaning up log files for instance?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Jim C. Nasby wrote:

FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.


How about pushing thresholds all the way down to 0?



As long as it handles small (or zero row) tables ok then yes.  The 
base_threshold in the originial contrib autovacuum was just an easy way 
to not vacuum really small tables too often.  If a table has only 10 
rows, it's going to get vacuumed every time one row is updated.  I guess 
that's not a big problem with a table that small but still seems excessive.


If you think this isn't a problem with the current autovacuum, then sure 
turn it down to zero, and perhaps we can even get rid of it altogether 
in another release or two.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Why so many out-of-disk-space failures on buildfarm machines?

2007-07-03 Thread Kris Jurka



On Tue, 3 Jul 2007, Tom Lane wrote:


I realize that a lot of these members are running on old underpowered
machines with not so much disk, but is it possible that the buildfarm
itself is leaking disk space?  Not cleaning up log files for instance?



No, the buildfarm does not leak disk space.  It is possible that members 
are configured with --keepall, which keeps the entire directory tree if a 
failure occurs.  That can fill up a lot of space quickly when you get a 
failure.


Kris Jurka

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] what is difference between LOCAL and GLOBAL TEMP TABLES in PostgreSQL

2007-07-03 Thread Bruce Momjian
> The use case is any system that uses temp tables in an OLTP setting,
> which certainly isn't uncommon. The problem is that today (and as well
> with a global temp table that is still writing to the catalogs) is that
> every OLTP operation that creates or drops a temp table is doing DDL.
> At best, that leads to a lot of catalog bloat. Right now, it appears to
> also expose some race conditions (we've got a customer that's been bit
> by this and we've been able to reproduce some odd behavior in the lab).

The solution is to fix the bloat, not add a work-around.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org