Re: [HACKERS] Why O_SYNC is faster than fsync on ext3

2004-03-21 Thread Manfred Spraul
Yusuf Goolamabbas wrote:

I sent this to Bruce but forgot to cc pgsql-hackers, The patches are
likely to go into 2.6.6. People interested in extremely safe fsync
writes should also follow the IDE barrier thread and the true fsync() in
Linux on IDE thread
 

Actually the most interesting part of the thread was the initial post 
from Peter Zaitsev on a fcntl(fd, F_FULLSYNC, NULL): He wrote that this 
is necessary for Mac OS X to force a flush of the write caches in the 
disks. Unfortunately I can't find anything about this flag with google.

Another interesting point is that right now, ide write caches must be 
disabled for reliable fsync operations  with Linux. Recent suse kernels 
contain partial support. If the existing patches are completed and 
merged, it will be safe to enable write caching.

Perhaps Bruce's cache flush test could be modified slightly to check 
that the OS isn't lying about fsync: if fsync is faster than the 
rotational delay of the disks, then the setup is not suitable for 
postgres. This could be recommended as a setup test in the install document.

--
   Manfred
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Why O_SYNC is faster than fsync on ext3

2004-03-21 Thread Tom Lane
Yusuf Goolamabbas <[EMAIL PROTECTED]> writes:
> Bruce, haven't followed the thread completely. Accessing the web archive
> is slow from Hong Kong but I just wanted to point you to this lkml post
> which shows why O_SYNC is much faster than fsync (at least on ext3)
> http://marc.theaimsgroup.com/?l=linux-kernel&m=107959907410443&w=2

That patch is broken on its face.  If O_SYNC doesn't take longer than
O_DSYNC, and likewise fsync longer than fdatasync, then the Unix
filesystem semantics are not being honored because the file mod time
isn't being updated.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Unbalanced Btree Indices ...

2004-03-21 Thread Tom Lane
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> Pointers to docs that I'm not finding most acceptable ...

There aren't any, because at the time the 7.4 docs were being prepared,
we didn't have enough field experience to know whether its version of
index compaction would eliminate the practical need for REINDEX or not.
So the docs are intentionally vague.

We still haven't gotten a lot of feedback about the point from 7.4
users.  Maybe no news is good news, or maybe it isn't ...

> Assuming that a rebuild is required, is there anyway of seeing how the
> index is balanced, to know when to do it?

Balance isn't a consideration for b-trees, because every part of the
tree is always the same depth.  The only criterion is whether the
physical size of the index is unreasonable compared to the number of
entries it contains.  "VACUUM VERBOSE" is sufficient to find out about
that, though I suppose it could give you a more direct index loading
figure than it does --- right now you have to do your own counting on
your fingers to estimate the average tuple size.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during

2004-03-21 Thread Sailesh Krishnamurthy
> "Kevin" == Kevin Brown <[EMAIL PROTECTED]> writes:

>> The bigger problem though with this is that it makes the
>> problem of list overflow much worse.  The hard part about
>> shared memory management is not so much that the available
>> space is small, as that the available space is fixed --- we
>> can't easily change it after postmaster start.  The more finely

Again, I can suggest the shared memory MemoryContext we use in
TelegraphCQ that is based on the OSSP libmm memory manager. We use it
to grow and shrink shared memory at will.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] execute command tag including affected rows count

2004-03-21 Thread Kris Jurka

Would it be possible to have the command completion tag for EXECUTE return
the affected row count?  The JDBC API says you should be able to get the
affected row count for any statement and I'm working on integrating a
patch that transforms regular statements to server side prepared
statements, and I'm stuck on this issue.

CREATE TABLE t (a int);
INSERT INTO t VALUES (1);
INSERT INTO t VALUES (2);
PREPARE mystatement AS DELETE FROM t;
EXECUTE mystatement;

would ideally return "EXECUTE 2"

Kris Jurka


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Unbalanced Btree Indices ...

2004-03-21 Thread Arthur Ward
> "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
>> Pointers to docs that I'm not finding most acceptable ...
>
> There aren't any, because at the time the 7.4 docs were being prepared,
> we didn't have enough field experience to know whether its version of
> index compaction would eliminate the practical need for REINDEX or not.
> So the docs are intentionally vague.
>
> We still haven't gotten a lot of feedback about the point from 7.4
> users.  Maybe no news is good news, or maybe it isn't ...

While upgrading from 7.4.1 to 7.4.2 last week, our production system
dropped at least 2GB during a vacuum full, and a reindex following that
also dropped about 2GB. Does that count as index bloat? It "feels" like
it's much better than 7.3.x, but it could also be my imagination. I wasn't
watching the details of the vacuum since I was mostly interested in
whether pg_autovacuum was keeping up and/or whether our FSM settings
needed an increase.

If there's some particular metric for index bloat that you can rattle off
the top of your head, I'll be glad to report back on it in another weekend
or two, when we've had some time to build up some bloat again.


---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] Reporting errors inside plpgsql/SPI queries

2004-03-21 Thread Tom Lane
I've been fooling with adding a report of the executing query to the
CONTEXT stack when an error happens within a query submitted by a
plpgsql function.  Since plpgsql submits all its queries through SPI,
the most convenient place to do this is in spi.c, and so the behavior
will also apply to queries submitted via SPI by user-written C
functions.

What I've currently got labels the failing query as a "SPI query",
for example

regression=# create or replace function foo(text) returns text as $$
regression$# begin
regression$# execute 'select * from ' || $1;
regression$# return 'good';
regression$# end
regression$# $$ language plpgsql;
CREATE FUNCTION

regression=# select foo('int4_tbl');
 foo
--
 good
(1 row)

regression=# select foo('nosuch_tbl');
ERROR:  relation "nosuch_tbl" does not exist
CONTEXT:  SPI query "select * from nosuch_tbl"
PL/pgSQL function "foo" line 2 at execute statement

regression=# select foo('fee fie fo fum');
ERROR:  syntax error at or near "fo" at character 23
CONTEXT:  SPI query "select * from fee fie fo fum"
PL/pgSQL function "foo" line 2 at execute statement

Although this is quite reasonable for queries submitted by user-written
C functions, I'm worried that plpgsql programmers will be confused
because they've never heard of SPI.  I toyed with saying "SQL query"
instead, but that seems pretty nearly content-free ... it doesn't
distinguish these queries from ones submitted directly by the client.
Can anyone think of a better wording?  Does this bother people enough
to justify hacking the SPI interface to allow a label to be passed in?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Syntax error reporting (was Re: [PATCHES] syntax error position "CREATE FUNCTION" bug fix)

2004-03-21 Thread Tom Lane
Fabien COELHO <[EMAIL PROTECTED]> writes:
> I agree with you that both reports should not look the same.

> The good news is that they already do not look the same, thanks
> to the CONTEXT information.

Right, but you quite properly didn't like my quick-hack to psql that
assumes that the presence of any CONTEXT means it's not a top-level
syntax error.  I would like to replace that hack with something cleaner.

>> We have in fact misimplemented it, because it is being set for syntax
>> errors in internally generated queries too.

> Well, from the parser point of view, it is a little bit messy to have
> to do different things for error reporting in different context. That
> why I would try a one-fit-all solution.

The parser needn't do anything different.  What I'm imagining is that
for internally submitted queries, there will always be an
error_context_stack routine that can transform the error report into
whatever we decide the appropriate format is.

>> However it might be better to invent a new error-message field that
>> carries just the text of the SQL command, rather than stuffing it into
>> CONTEXT.

> I'm not sure I would like the CONTEXT field for that? as it may be
> usefull for a lot of things. In your above example, the CONTEXT is filled
> with 2 different informations that are just messed up for the client.
> If localisation is used, there would be no way for a client to seperate
> them. Different information should require different fields.

The point is that CONTEXT is essentially a record of "how we got here".
In a situation where the actual error occurs inside a couple of levels
of nesting, you want to be able to report the outer queries as well as
the one that directly caused the error.  I agree that there's probably
little hope of clients automatically making sense of the CONTEXT info.
But we need to provide it to help people debug complex functions.

> More over, I have other ideas for CONTEXT, which should really be a stack.

It already is a stack.

>> The other thing to think about is whether we should invent a new field
>> to carry syntax error position for generated queries, rather than making
>> 'P' do double duty as it does now.

> I think a new field is alas necessary.

I'm leaning in that direction also.  How about

'P': used only for syntax error position in the client-submitted query.

'p': syntax error position in an internally-generated query.

'q': text of an internally-generated query ('p' and 'q' would always
 appear together).

In the case of a non-syntax error in an internally generated query, we
should stick the query text into the CONTEXT stack, since it might not
be the most closely nested context item anyway.

An existing client will ignore the 'p' and 'q' fields, thus providing
behavior that's no worse than what you get with 7.4 now.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Reporting errors inside plpgsql/SPI queries

2004-03-21 Thread Joe Conway
Tom Lane wrote:
Although this is quite reasonable for queries submitted by
user-written C functions, I'm worried that plpgsql programmers will
be confused because they've never heard of SPI.  I toyed with saying
"SQL query" instead, but that seems pretty nearly content-free ... it
doesn't distinguish these queries from ones submitted directly by the
client. Can anyone think of a better wording?
"Embedded query"?

Does this bother people enough to justify hacking the SPI interface
to allow a label to be passed in?
That may be the only way to have the message make sense in all contexts.

Joe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] execute command tag including affected rows count

2004-03-21 Thread Neil Conway
On 21-Mar-04, at 11:39 AM, Kris Jurka wrote:
Would it be possible to have the command completion tag for EXECUTE 
return
the affected row count?
Yes, this has been suggested before -- for example:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg26787.html

I agree it would be a good idea, although I can't recall what the 
conclusion of the previous discussions was -- did anyone raise any 
critical objections to this?

-Neil

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] 7.4.2 Build broken on (Sparc) Solaris 7 and 8

2004-03-21 Thread Jim Seymour
Bruce Momjian <[EMAIL PROTECTED]> wrote:
> 
[snip]
> 
> OK, patch attached and applied.  It adds configure tests for the 5-arg
> version of getpwuid_r() and properly falls back to the Posix draft
> version you have on Solaris.  Seems Solaris 9 also still has the draft
> version.
[snip]

Well, yes and no.  If you define _POSIX_PTHREAD_SEMANTICS, you get the
5-arg version.  It looks like this has been the case at least back to
Solaris 2.5.1.

I didn't really expect anything prior to 2.5.1 to be an issue, so I
didn't bother looking into fixes for anything beyond that.

Regards,
Jim

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] pg_autovacuum next steps

2004-03-21 Thread Matthew T. O'Connor
Lately I have been thinking about the next steps for the pg_autovacuum
daemon.  I have written up a document that describes what I'm planning
to do next.   Please read the attached and response as I would really
like some feedback.

Thanks,

Matthew O'Connor




pg_autovacuum_v2_writeup.rtf
Description: RTF file

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] execute command tag including affected rows count

2004-03-21 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> On 21-Mar-04, at 11:39 AM, Kris Jurka wrote:
>> Would it be possible to have the command completion tag for EXECUTE 
>> return
>> the affected row count?

> Yes, this has been suggested before -- for example:

> http://www.mail-archive.com/[EMAIL PROTECTED]/msg26787.html

> I agree it would be a good idea, although I can't recall what the 
> conclusion of the previous discussions was -- did anyone raise any 
> critical objections to this?

The previous discussion concluded that EXECUTE should return the command
tag of the executed command --- eg, "UPDATE n" not "EXECUTE n".  Without
this, you cannot for example sensibly cope with both the INSERT and
UPDATE tag formats.

Not sure what it takes internally to the backend to make this happen
... but IIRC there already is provision for utility commands to override
their default tag, so I hope that it would be a pretty localized change.

However, does this really solve Kris' problem?  JDBC generally likes to
think that it works with old Postgres versions; will a fix that only
works in >= 7.5 be good enough for them?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Tom Lane
"Matthew T. O'Connor" <[EMAIL PROTECTED]> writes:
> Lately I have been thinking about the next steps for the pg_autovacuum
> daemon.  I have written up a document that describes what I'm planning
> to do next.   Please read the attached and response as I would really
> like some feedback.

> [ rtf document ]

Please repost in some less proprietary format.  Plain text is generally
considered the thing to use on this list.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Markus Bertheau
Ð ÐÐÐ, 22.03.2004, Ð 00:12, Tom Lane ÐÐÑÐÑ:
> Please repost in some less proprietary format.  Plain text is generally
> considered the thing to use on this list.

-- 
Markus Bertheau <[EMAIL PROTECTED]>
pg_autovacuum Version 2
Design Document:


Exec Summary:
pg_autovacuum was initially released as a contrib module in PostgreSQL v7.4.   The 
version in 7.4 is by design very simple.  No configuration is required, and very 
little configuration is possible.  Despite these limitations it was voted the most 
popular new feature of PostgreSQL v7.4 according to the survey held on postgresql.org 
(http://www.postgresql.org/survey.php?View=1&SurveyID=23).  

Despite it's popularity there is much room for improvement.  This document sets out to 
define the most important improvements that would help pg_autovacuum to become a truly 
powerful asset to the suite of tools that come with PostgreSQL.


Current Problems & Limitations:

Based on user feedback from people using pg_autovacuum in the field, and my own 
observations, there are a number of problems and limitation with pg_autovacuum.  They 
are:

Inability to customize thresholds on a per table basis
Inability to set default thresholds on a per database basis
Inability to exclude specific databases / tables from pg_autovacuum monitoring
Inability to schedule vacuums during off-peak times
Lack of integration related to startup and shutdown
Ignorance of VACUUM and ANALYZE operations performed outside pg_autovacuum (requires 
backend integration? or can listen / notify can be used?)
Lack of logging options / syslog integration / log rotation options
Create table fails because template1 is busy

I'm not sure how to address all of these concerns, or that they all should be 
addressed right now.  One of my big questions is backend integration.  I am leaning 
towards leaving pg_autovacuum as a client application in contrib for one more release. 
 During this time, I can continue to tweak and improve pg_autovacuum so that we will 
have a very good idea what the final product should be before we make it a standard 
backend process.


For PostgreSQL 7.5,  I plan to implement these new features:

1.Per database defaults and per table thresholds (including total exclusion)
2.Persistent data
3.Single-Pass Mode (external scheduling from cron etc...)
4.Off peak scheduling

1. Per Database defaults and Per table Thresholds:

There are differing opinions as to the best way to providing these this feature.  The 
primary debate is where to save the configuration data.  I see three options:

1.Store config data inside a special pg_autovacuum table inside existing databases 
that wants custom settings.   

2.Use a config file.  This would require some additional coding to add the required 
parsing, but is possible. 

3.Create a pg_autovacuum database inside any cluster that wants to customize their 
settings.  

Since many people do not like tools that clutter their databases by adding tables, I 
think option 1 (adding a pg_autovacuum table to existing databases) is right out.  
Using a config file would be Ok, but would require additional parsing code.  My 
preference is option 3.  Since pg_autovacuum will (hopefully) eventually become an 
integrated part of the backend, it will eventually be able to add required data to the 
system catalogs.  Given these two premises, as long as pg_autovacuum remains a contrib 
module it could use it's own database to mimic having system tables.  If this database 
exists, it will be used, if it does not exist, then pg_autovacuum will work just as it 
did in the 7.4 release with very limited options available to it.  The user will be 
able to specify a non-default database.

Table Structure for database specific defaults and table specific thresholds:

databases_defaults: (will reference the pg_class system table)
id  serial primary key
exclude_databaseboolean
default_vacuum_scaling_factor   float
default_vacuum_base_value   int
default_analyze_scaling_factor  float
default_analyze_base_value  int
dboid   oid references pg_database.oid

table_thresholds
id  serial primary key
exclude_table   boolean (exclude this table)
vacuum_scaling_factor   float   (equivalent to  -v)
vacuum_base_value   int (equivalent to -V)
vacuum_thresholdfloat   (if > 0, use this threshold)
analyze_scaling_factor  float   (equivalent to -a)
analyze_base_value  int (equivalent to -A)
analyze_threshold   float   (if > 0 use this threshold)
relid   oid references pg_classs.relid


2.Persistent pg_autovacuum Data:

Right now pg_autovacuum has no memory of what was go

Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Peter Eisentraut
Matthew T. O'Connor wrote:
> Lately I have been thinking about the next steps for the
> pg_autovacuum daemon.  I have written up a document that describes
> what I'm planning to do next.   Please read the attached and response
> as I would really like some feedback.

I think these configuration issues will become a lot easier if you make 
the autovacuum daemon a subprocess of the postmaster (like, say, the 
checkpoint process).  Then you have access to a host of methods for 
storing state, handling configuration, etc.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Christopher Kings-Lynne
I think these configuration issues will become a lot easier if you make 
the autovacuum daemon a subprocess of the postmaster (like, say, the 
checkpoint process).  Then you have access to a host of methods for 
storing state, handling configuration, etc.
Yeah - why delay making it a backend process? :)

Chris

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] execute command tag including affected rows count

2004-03-21 Thread Kris Jurka


On Sun, 21 Mar 2004, Tom Lane wrote:

> > On 21-Mar-04, at 11:39 AM, Kris Jurka wrote:
> >> Would it be possible to have the command completion tag for EXECUTE 
> >> return
> >> the affected row count?
> 
> However, does this really solve Kris' problem?  JDBC generally likes to
> think that it works with old Postgres versions; will a fix that only
> works in >= 7.5 be good enough for them?
> 

I oversimplified the description of the current patch I'm working on,
server side prepared statement support has been in the driver since the
7.3 days.  So this problem already exists and has simply been unnoticed.  
With the current code it is difficult to enable server prepared statements
and requires writing pg specific code outside of the standard JDBC API.  
The patch I'm reviewing makes it simple to turn on server prepared 
statements globally and brought this bug to my attention.  I would say 
"better late than never."

Kris Jurka

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Matthew T. O'Connor
On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:
> > I think these configuration issues will become a lot easier if you make 
> > the autovacuum daemon a subprocess of the postmaster (like, say, the 
> > checkpoint process).  Then you have access to a host of methods for 
> > storing state, handling configuration, etc.
> 
> Yeah - why delay making it a backend process? :)

Ok, well this was part of the reason to have this conversation.

My reasons:
A) I wasn't sure if people really thought this was ready to be
integrated.  Tom had said a while ago, that it was a good to keep it as
a contrib module while it's still actively being developed.

B) Perhaps people like the idea of it being a client app (I don't think
so.)

C) Most importantly, I'm not backend hacker.  If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there.  A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.

So the first question big question is: Do we want to make it a backend
subprocess now?

Secondly, are there any other features that people are interested in
that were not mentioned in my document?


Matthew O'Connor








---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] execute command tag including affected rows count

2004-03-21 Thread Bruce Momjian
Neil Conway wrote:
> On 21-Mar-04, at 11:39 AM, Kris Jurka wrote:
> > Would it be possible to have the command completion tag for EXECUTE 
> > return
> > the affected row count?
> 
> Yes, this has been suggested before -- for example:
> 
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg26787.html
> 
> I agree it would be a good idea, although I can't recall what the 
> conclusion of the previous discussions was -- did anyone raise any 
> critical objections to this?

Should this be a TODO?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Matthew T. O'Connor
On Sun, 2004-03-21 at 18:12, Tom Lane wrote:
> "Matthew T. O'Connor" <[EMAIL PROTECTED]> writes:
> > [ rtf document ]
> 
> Please repost in some less proprietary format.  Plain text is generally
> considered the thing to use on this list.

I don't think RTF is proprietary but I should have just posted inline
anyway so here is a copy:

pg_autovacuum Version 2
Design Document:


Exec Summary:
pg_autovacuum was initially released as a contrib module in PostgreSQL
v7.4.   The version in 7.4 is by design very simple.  No configuration
is required, and very little configuration is possible.  Despite these
limitations it was voted the most popular new feature of PostgreSQL v7.4
according to the survey held on postgresql.org
(http://www.postgresql.org/survey.php?View=1&SurveyID=23).  

Despite it's popularity there is much room for improvement.  This
document sets out to define the most important improvements that would
help pg_autovacuum to become a truly powerful asset to the suite of
tools that come with PostgreSQL.


Current Problems & Limitations:

Based on user feedback from people using pg_autovacuum in the field, and
my own observations, there are a number of problems and limitation with
pg_autovacuum.  They are:

* Inability to customize thresholds on a per table basis
* Inability to set default thresholds on a per database basis
* Inability to exclude specific databases / tables from pg_autovacuum
monitoring
* Inability to schedule vacuums during off-peak times
* Lack of integration related to startup and shutdown
* Ignorance of VACUUM and ANALYZE operations performed outside
pg_autovacuum (requires backend integration? or can listen / notify can
be used?)
* Lack of logging options / syslog integration / log rotation options
* Create table fails because template1 is busy

I'm not sure how to address all of these concerns, or that they all
should be addressed right now.  One of my big questions is backend
integration.  I am leaning towards leaving pg_autovacuum as a client
application in contrib for one more release.  During this time, I can
continue to tweak and improve pg_autovacuum so that we will have a very
good idea what the final product should be before we make it a standard
backend process.


For PostgreSQL 7.5,  I plan to implement these new features:

 1.Per database defaults and per table thresholds (including total
exclusion)
 2.Persistent data
 3.Single-Pass Mode (external scheduling from cron etc...)
 4.Off peak scheduling

1. Per Database defaults and Per table Thresholds:

There are differing opinions as to the best way to providing these this
feature.  The primary debate is where to save the configuration data.  I
see three options:

 1.Store config data inside a special pg_autovacuum table inside
existing databases that wants custom settings.   

 2.Use a config file.  This would require some additional coding to add
the required parsing, but is possible. 

 3.Create a pg_autovacuum database inside any cluster that wants to
customize their settings.  

Since many people do not like tools that clutter their databases by
adding tables, I think option 1 (adding a pg_autovacuum table to
existing databases) is right out.  Using a config file would be Ok, but
would require additional parsing code.  My preference is option 3. 
Since pg_autovacuum will (hopefully) eventually become an integrated
part of the backend, it will eventually be able to add required data to
the system catalogs.  Given these two premises, as long as pg_autovacuum
remains a contrib module it could use it's own database to mimic having
system tables.  If this database exists, it will be used, if it does not
exist, then pg_autovacuum will work just as it did in the 7.4 release
with very limited options available to it.  The user will be able to
specify a non-default database.

Table Structure for database specific defaults and table specific
thresholds:

databases_defaults: (will reference the pg_class system table)
id  serial primary key
exclude_databaseboolean
default_vacuum_scaling_factor   float
default_vacuum_base_value   int
default_analyze_scaling_factor  float
default_analyze_base_value  int
dboid   oid references pg_database.oid

table_thresholds
id  serial primary key
exclude_table   boolean (exclude this table)
vacuum_scaling_factor   float   (equivalent to  -v)
vacuum_base_value   int (equivalent to -V)
vacuum_thresholdfloat   (if > 0, use this threshold)
analyze_scaling_factor  float   (equivalent to -a)
analyze_base_value  int (equivalent to -A)
analyze_threshold   float   (if > 0 use this threshold)
relid   oid references pg_classs.relid


2.Persistent pg_autovacuum Data:

Right now pg_autovacuum has no me

Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Gavin Sherry
On Sun, 21 Mar 2004, Matthew T. O'Connor wrote:

> On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:
> > > I think these configuration issues will become a lot easier if you make
> > > the autovacuum daemon a subprocess of the postmaster (like, say, the
> > > checkpoint process).  Then you have access to a host of methods for
> > > storing state, handling configuration, etc.
> >
> > Yeah - why delay making it a backend process? :)
>
> Ok, well this was part of the reason to have this conversation.
>
> My reasons:
> A) I wasn't sure if people really thought this was ready to be
> integrated.  Tom had said a while ago, that it was a good to keep it as
> a contrib module while it's still actively being developed.

I was talking to Jan about some other work on VACUUM related to more
intelligent vacuuming. Namely, maintaining a map (outside of shared
memory) of blocks which have been pushed out of the free space map for
VACUUM to visit (which requires a backend process) and being aware of load
restrictions (ie, allowing user to only vacuum when the load average is
less than X, for example) and some other leveling stuff to ensure that
availability is consistent. Whilst this doesn't related to pg_autovacuum
specifically, it'd be great if they could be released at the same time, I
think.

>
> B) Perhaps people like the idea of it being a client app (I don't think
> so.)
>

I'd like to see it as part of the backend.

> C) Most importantly, I'm not backend hacker.  If someone wants to do the
> initial work of getting it running as a backend process, I can take it
> from there.  A while ago, Bruce offered to help me with any backend
> issues I might have, so perhaps with a little help I can take a run at
> it.

I'd be happy to help you out.

Gavin

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] pg_autovacuum next steps

2004-03-21 Thread Bruce Momjian
> > B) Perhaps people like the idea of it being a client app (I don't think
> > so.)
> >
> 
> I'd like to see it as part of the backend.
> 
> > C) Most importantly, I'm not backend hacker.  If someone wants to do the
> > initial work of getting it running as a backend process, I can take it
> > from there.  A while ago, Bruce offered to help me with any backend
> > issues I might have, so perhaps with a little help I can take a run at
> > it.
> 
> I'd be happy to help you out.

Agreed.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 7.4.2 Build broken on (Sparc) Solaris 7 and 8

2004-03-21 Thread Bruce Momjian
Jim Seymour wrote:
> Bruce Momjian <[EMAIL PROTECTED]> wrote:
> > 
> [snip]
> > 
> > OK, patch attached and applied.  It adds configure tests for the 5-arg
> > version of getpwuid_r() and properly falls back to the Posix draft
> > version you have on Solaris.  Seems Solaris 9 also still has the draft
> > version.
> [snip]
> 
> Well, yes and no.  If you define _POSIX_PTHREAD_SEMANTICS, you get the
> 5-arg version.  It looks like this has been the case at least back to
> Solaris 2.5.1.
> 
> I didn't really expect anything prior to 2.5.1 to be an issue, so I
> didn't bother looking into fixes for anything beyond that.

Oh, very interesting.  CVS HEAD has in template/solaris:

# tools/thread/thread_test must be run
if test "$GCC" = yes
thenTHREAD_CPPFLAGS="-D_POSIX_PTHREAD_SEMANTICS"
THREAD_LIBS="-pthread"
elseTHREAD_CPPFLAGS="-mt -D_POSIX_PTHREAD_SEMANTICS"
THREAD_LIBS="-lpthread"
fi

I added the "-D_POSIX_PTHREAD_SEMANTICS"  flags into post-7.4.2, so it
seems this platform would work even without checking for the 4-arg
getpwuid_r version.  However, I noticed that 'man getpwuid_r' only
mentions the 4-arg version.

I will leave the 4-arg check in. The original author thought it might be
needed, and the Solaris manual mentions it, so odds are some other
platforms will hit it too, and perhaps not have the 5-arg version.

Thanks for the research.  I will add a mention in the solaris template
file.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] UnixWare/CVS Tip/initdb.c needs to use threads

2004-03-21 Thread Bruce Momjian
Larry Rosenman wrote:
>  The a.out (not any library) should be linked with -Kpthread (not 
> -lpthread).
> This will force libthread to be linked in the right order relative to
> libc, libC, networking libraries, etc.
> 
> In other words, the entire application either is or is not linked with
> threads; it's not a property of an individual library.
> 
> 
> SO, IF we are using the threads flags, we need to use them on ALL 
> libpq-using programs, ours or the users.

Seems we have a few options for making threaded libpq on Unixware:

o  remove thread-safe SIGPIPE code, which calls thread library

o  create a threaded and non-threaded libpq library

o  add a libpq function that enables threading, and do dynamic
linking of thread calls based on that function

o  Add thread flags to all builds on that platform, including
the backend

o  Add the ability to specify compile/link flags for everything
but the backend

As I remember, libcrypt used to be required by all libpq builds on
various platforms.   This seems to be a similar case.

I think the last option might be the best.  Somehow create different
cppflags/libs for the backend and non-backend programs.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] COPY formatting

2004-03-21 Thread Karel Zak
On Fri, Mar 19, 2004 at 09:54:37AM -0500, Tom Lane wrote:
> Karel Zak <[EMAIL PROTECTED]> writes:
> >  The problem with  CSV is that it will correctly  work with new protocol
> >  only. Because old  versions of  clients are newline  sensitive.
> 
> Why?  The client-side code doesn't have any real say over the meaning of
> the data, at least not in psql-class clients.  I suppose a client app
> that tries to interpret the data could get confused, but psql sure
> doesn't do that.

 libpq, fe-protocol2.c, pqGetCopyData2() and pqGetline2() cut string
 behind '\n'.

Karel

-- 
 Karel Zak  <[EMAIL PROTECTED]>
 http://home.zf.jcu.cz/~zakkr/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])