date:20080912

Re: [HACKERS] Noisy CVS updates

2008-09-12 Thread Decibel!


On Sep 5, 2008, at 9:06 AM, D'Arcy J.M. Cain wrote:
...etc.  Would it be OK if I went in and added .cvsignore files to  
keep
the noise level down?  I guess I could have my daily script filter  
them

out but there may be times when there really is an unexpected file and
I want to follow up on it.



+1. It might be possible to get screwed by not doing a distclean, but  
the build time savings seems worth it (first thing I do if I get a  
build error is make clean/distclean).

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] New FSM allocation policy

2008-09-12 Thread Decibel!


On Sep 5, 2008, at 9:43 PM, Bruce Momjian wrote:

Fortunately there's an easy fix for that. If we optimize
RecordAndGetPageWithFreeSpace so that it will always return the next
page if it has enough space, we'll be doing sequential I/O again.  
That's

trivial as long as the next heap page is on the same FSM page, and
probably not too hard even if it's not. If we limit this  
optimization to
within the same FSM page, we'll effectively be filling fully a  
32MB stripes


Thoughts?

I'm running more tests, and there's more issues that need discussion,
but I'll start separate threads for each. I'll also post an updated
patch separately.


One other thing to keep in mind is that VACUUM can reduce a table's  
size

if the trailing blocks are empty, so there is some gain if the earlier
parts of the table are preferred for inserts.



Yeah; I would actually really, really like to see a mode you could  
set on a table that says "I want to try and shrink this table". One  
of the things that would mean is that the FSM should prefer pages at  
the beginning of the heap.


Also related to this is the idea of asking the FSM for pages within a  
specific range so that you can try and maintain cluster order on a  
table. You would look in the clustering index for the closest value  
to your key and where it is in the heap and then ask for a page in  
that neighborhood. (You'd probably want to look at more than just one  
index tuple, but you get the idea).

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-12 Thread Tom Lane

Ron Mayer <[EMAIL PROTECTED]> writes:
> Cool.  I think I have it pretty much working with a new
> GUC "intervalstyle" that can take values of

> "sql_standard" that I think will output SQL standard
> interval literals when given a sql
> standard interval.

> "iso_8601" that will output ISO 8601 "Time Intervals" of
> the "format with time-unit deignators", and

> "backward_compatible" that will output the same thing
> that postgres currently does that depends
> on the value of the DateStyle GUC.

Actually, we have never considered that new releases need to preserve
the behavior of postgresql.conf settings.  So the above seems
unnecessarily baroque.  How about decoupling interval_out's behavior
from DateStyle altogether, and instead providing values of IntervalStyle
that match all the previous behaviors?

> Should those ECPG functions be made identical to
> the ones in the backend?

The ECPG situation is a mess :-(.  That code was forked off from the
backend some time ago, and has not been well maintained at all.  If you
are brave enough to tackle that mess, more power to you; but I strongly
suggest doing it as an independent patch.

> Could those somehow share code with the backend for
> some of their work?

The palloc and elog dependencies seem to be the hard part.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Upcoming back-branch update releases

2008-09-12 Thread Tom Lane

It's been three months since our last release set, and in view of the
recent discovery of a potential-data-loss issue in 8.3, it seems that
we should get moving on another one.  Accordingly, core has decided that
we will freeze a set of update releases next Thursday (9/18) for public
announcement Monday 9/22.  Get those last minute bug fixes in ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-12 Thread Ron Mayer


Tom Lane wrote:

I think all
you need is 'yyy-mm dd hh:mm:ss' where you omit yyy-mm if
zeroes, omit dd if zero, omit hh:mm:ss if zeroes (but maybe
only if dd is also 0?  otherwise your output is just dd which
is uncomfortably ambiguous).


Cool.  I think I have it pretty much working with a new
GUC "intervalstyle" that can take values of

"sql_standard" that I think will output SQL standard
   interval literals when given a sql
   standard interval.

"iso_8601" that will output ISO 8601 "Time Intervals" of
   the "format with time-unit deignators", and

"backward_compatible" that will output the same thing
   that postgres currently does that depends
   on the value of the DateStyle GUC.

I'll add the documentation and regression tests and
can submit a patch early next week.   Oh.  One more
question is that under ecpg there seems to be a fair
amount of near-duplicated code (EncodeDateTime,
EncodeInterval) for turning dates and times and
intervals to strings.

Should those ECPG functions be made identical to
the ones in the backend?
Could those somehow share code with the backend for
some of their work?


Anyway - here's a quick test of the
SQL Standard and ISO interval output as it stands
right now...



regression=# drop table test_intervals;
DROP TABLE
regression=# create temporary table test_intervals (i interval);
CREATE TABLE
regression=# insert into test_intervals values
regression-#   ('0 years'),
regression-#   ('1 year 1 month'),
regression-#   ('1 day 2 hours 3 minutes 4 seconds'),
regression-#   ('1 year 1 minute');
INSERT 0 4
regression=#
regression=# insert into test_intervals values
regression-#   ('1-1'),
regression-#   ('1'),
regression-#   (interval '1' year),
regression-#   ('1:00:00'),
regression-#   ('1 1:02:03');
INSERT 0 5
regression=#
regression=# insert into test_intervals values
regression-#   ('P1Y1M'),
regression-#   ('P1DT1H1M1S'),
regression-#   ('PT1S');
INSERT 0 3
regression=#
regression=# set intervalstyle to sql_standard;
SET
regression=# select * from test_intervals;
  i
-
 0
 1-1
 1 2:3:4
 1-0 0 0:1:0
 1-1
 0:0:1
 1-0
 1:0:0
 1 1:2:3
 1-1
 1 1:1:1
 0:0:1
(12 rows)


regression=#
regression=# set intervalstyle to iso_8601;
SET
regression=# select * from test_intervals;
 i

 PT0S
 P1Y1M
 P1DT2H3M4S
 P1YT1M
 P1Y1M
 PT1S
 P1Y
 PT1H
 P1DT1H2M3S
 P1Y1M
 P1DT1H1M1S
 PT1S
(12 rows)

regression=#
regression=# set intervalstyle to backward_compatible;
SET
regression=# set datestyle to sql;
SET
regression=# select * from test_intervals;
   i
---
 @ 0
 @ 1 year 1 mon
 @ 1 day 2 hours 3 mins 4 secs
 @ 1 year 1 min
 @ 1 year 1 mon
 @ 1 sec
 @ 1 year
 @ 1 hour
 @ 1 day 1 hour 2 mins 3 secs
 @ 1 year 1 mon
 @ 1 day 1 hour 1 min 1 sec
 @ 1 sec
(12 rows)

regression=# set datestyle to iso;
SET
regression=# select * from test_intervals;
i
-
 00:00:00
 1 year 1 mon
 1 day 02:03:04
 1 year 00:01:00
 1 year 1 mon
 00:00:01
 1 year
 01:00:00
 1 day 01:02:03
 1 year 1 mon
 1 day 01:01:01
 00:00:01
(12 rows)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential Join Performance Issue

2008-09-12 Thread Lawrence, Ramon

> From: Tom Lane [mailto:[EMAIL PROTECTED]
> I was intending to do it the other way, actually.  An extra field in
> HashPath hardly costs anything.  The other reason for it is that there
> are other possible uses for knowing whether a hash will be
multi-batch.
> (For example, if we were prepared to tell the executor that it *must*
> keep the hash to one batch, we could assume that the sort order of the
> left input is preserved.  I haven't looked into the risks/benefits of
> that too much, but it's been in the back of the mind for a long time.)

Having the number of batches in HashPath could be potentially useful for
a variety of reasons.  For our research, we have added an nbatch
variable in both HashPath and HashJoin.  Having it in HashJoin is useful
as we modified EXPLAIN to output the number of batches.  There are costs
in putting an nbatch variable in HashPath as the system may set this
variable potentially hundreds/thousands of times during costing and does
not (currently) use it until you convert the chosen HashPath to a plan.

> I'd be more inclined to deal with the issue by trying to establish a
> "safety margin" in the estimate of whether the hash will go
multi-batch.
> IOW we should disuse_physical_tlist if the hash is estimated to be
close
> to but still within one batch.

Our experiments with large TPC-H 1GB joins show that it is almost always
better to not use physical_tlists if the number of batches is > 1.
There is a noticeable (approximately 5-15%) improvement when using
physical_tlists for in-memory joins.  For batches of size 2, it
sometimes can go either way depending how many attributes are projected
out of the outer relation.  Using physical_tlists may be better even for
batches of size 2 if most of the attributes of the outer relation are
kept.  For a larger number of batches, the extra I/O cost significantly
dominates over the physical_tlist optimization.  Performance of
multi-batch joins may improve 50% or more by disabling the optimization.

It is possible to create a "safety margin" by having
ExecChooseHashTableSize() return the value
inner_rel_bytes/hash_table_bytes which represents the fraction of the
memory available that the inner relation is expected to consume.  You
can then make decisions based on that.   However, this is only as good
as the inner relation size estimate and especially for large queries,
the estimate may be quite inaccurate.  A more robust solution could
examine the "width" of the path and the "width" of the relation combined
with the number of batches to see if projecting early would be worth it.
It may be best to keep it simple and just use number of batches > 1 as a
criteria and instead focus on examining issues with inaccurate join size
estimates.  

--
Dr. Ramon Lawrence
Assistant Professor, Department of Computer Science, University of
British Columbia Okanagan
E-mail: [EMAIL PROTECTED]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-12 Thread Ron Mayer


Ron Mayer wrote:

Tom Lane wrote:

you need is 'yyy-mm dd hh:mm:ss' where you omit yyy-mm if
zeroes, omit dd if zero, omit hh:mm:ss if zeroes (but maybe
only if dd is also 0?  otherwise your output is just dd which
is uncomfortably ambiguous).


Oh, and if both parts are 0, I guess we desire
the (more comfortable than the alternatives) '0'?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-12 Thread Ron Mayer


Tom Lane wrote:

The reason it's not SQL-standard is the data value isn't.
So not a problem.  Someone conforming to the spec limits on
what he puts in will see spec-compliant output.  I think all
you need is 'yyy-mm dd hh:mm:ss' where you omit yyy-mm if
zeroes, omit dd if zero, omit hh:mm:ss if zeroes (but maybe
only if dd is also 0?  otherwise your output is just dd which
is uncomfortably ambiguous).


Great.  That's what I'll do.

Any convention or preference on the naming of the GUC?
I assume "intervalstyle" is reasonable?

Or thoughts regarding the current EncodeInterval() that's
already using the "datestyle" GUC?


pg82=# select interval '1';
 interval
--
 00:00:01
(1 row)

pg82=# set datestyle='sql';
SET

pg82=# select interval '1';
 interval
--
 @ 1 sec
(1 row)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-12 Thread Tom Lane

Ron Mayer <[EMAIL PROTECTED]> writes:
> Trying to do the SQL-standard output now, and have a question
> of what to do in the SQL-standard mode when trying to output
> an interval that as both a YEAR and a DAY component.

> AFAICT the SQL standard doesn't let you have both, so the
> "SQL-standard" output actually won't be.

The reason it's not SQL-standard is the data value isn't.
So not a problem.  Someone conforming to the spec limits on
what he puts in will see spec-compliant output.  I think all
you need is 'yyy-mm dd hh:mm:ss' where you omit yyy-mm if
zeroes, omit dd if zero, omit hh:mm:ss if zeroes (but maybe
only if dd is also 0?  otherwise your output is just dd which
is uncomfortably ambiguous).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-12 Thread Ron Mayer


Tom Lane wrote:

somewhat SQL-compliant on interval input, a GUC that selected
PG traditional, SQL-standard, or ISO 8601 interval output format seems
like it could be a good idea.


Trying to do the SQL-standard output now, and have a question
of what to do in the SQL-standard mode when trying to output
an interval that as both a YEAR and a DAY component.

AFAICT the SQL standard doesn't let you have both, so the
"SQL-standard" output actually won't be.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Alvaro Herrera

Hannu Krosing escribió:
> On Fri, 2008-09-12 at 17:45 +0100, Simon Riggs wrote:
> > On Fri, 2008-09-12 at 17:11 +0200, Csaba Nagy wrote:
> > 
> > > Why not have a design where the slave is in control for it's own data ?
> > > I mean the slave...
> > 
> > The slave only exists because it is a copy of the master. If you try to
> > "startup" a slave without first having taken a copy, how would you
> > bootstrap the slave? With what? To what? 
> 
> As I understand it, Csaba meant that slave would "bootstrap itself" by
> connecting to master in some early phase of startup, requesting a
> physical filesystem level copy of data, then commencing the startup in
> Hot Standby mode.

Interesting ...

This doesn't seem all that difficult -- all you need is to start one
connection to get the WAL stream and save it somewhere; meanwhile a
second connection uses a combination of pg_file_read on master +
pg_file_write on slave to copy the data files over.  When this step is
complete, recovery of the stored WAL commences.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Hannu Krosing

On Fri, 2008-09-12 at 17:45 +0100, Simon Riggs wrote:
> On Fri, 2008-09-12 at 17:11 +0200, Csaba Nagy wrote:
> 
> > Why not have a design where the slave is in control for it's own data ?
> > I mean the slave...
> 
> The slave only exists because it is a copy of the master. If you try to
> "startup" a slave without first having taken a copy, how would you
> bootstrap the slave? With what? To what? 

As I understand it, Csaba meant that slave would "bootstrap itself" by
connecting to master in some early phase of startup, requesting a
physical filesystem level copy of data, then commencing the startup in
Hot Standby mode.

If done that way, all the slave needs is a superuser level connection to
master database.

Of course this can also be done using little hot standby startup script
from slave, if shell access to master is provided,.

--
Hannu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread Alvaro Herrera

David E. Wheeler escribió:
> On Sep 12, 2008, at 11:35, David E. Wheeler wrote:
>
>> I've got another patch I'm working on adding support for "char" (and  
>> tests for char). Just to fill out a gap I saw in the casting coverage. 
>> I'm trying to get it done now. With that, AFAIK, citext will work just 
>> like text.
>
> Looks like the IO conversions handle char and "char", so the attached  
> patch just updates the regression test.

There are unresolved conflicts in the patch ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec

2008-09-12 Thread Ron Mayer


Tom Lane wrote:

Ron Mayer <[EMAIL PROTECTED]> writes:

... ISO 8601 intervals ...


On the output side, seems like a GUC variable
is the standard precedent here.  I'd still vote against overloading
DateStyle --- it does too much already --- but a separate variable for
interval style wouldn't bother me.  In fact, given that we are now
somewhat SQL-compliant on interval input, a GUC that selected
PG traditional, SQL-standard, or ISO 8601 interval output format seems
like it could be a good idea.


Is it OK that this seems to me it wouldn't be backward compatible
with the current interval_out that looks to me to be using
the DateStyle GUC?

I supposed it could be made backward compatible if the new
IntervalStyle GUC defaulted to a value of "guess_from_datestyle",
but I fear an option like that  might add rather than remove
confusion.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread David E. Wheeler


On Sep 12, 2008, at 11:35, David E. Wheeler wrote:

I've got another patch I'm working on adding support for "char" (and  
tests for char). Just to fill out a gap I saw in the casting  
coverage. I'm trying to get it done now. With that, AFAIK, citext  
will work just like text.


Looks like the IO conversions handle char and "char", so the attached  
patch just updates the regression test.


Best,

David


char_casts.patch
Description: Binary data



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread David E. Wheeler


On Sep 12, 2008, at 11:34, Tom Lane wrote:


CREATE OR REPLACE FUNCTION int8(citext)
RETURNS int8
AS 'SELECT int8( $1::text )'
LANGUAGE SQL IMMUTABLE STRICT;


Yeah, those are all replaced by the CoerceViaIO mechanism


Okay, thanks for the sanity check. The SQL versions are fine for me in  
8.3.


Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread David E. Wheeler


On Sep 12, 2008, at 11:31, Tom Lane wrote:


"David E. Wheeler" <[EMAIL PROTECTED]> writes:

Oh, and text_name seems to give me this error:



 ERROR:  compressed data is corrupt



That's when I have this cast:



 CREATE OR REPLACE FUNCTION citext(name)
 RETURNS citext
 AS 'text_name'
 LANGUAGE internal IMMUTABLE STRICT;


I think you've got the direction backwards.


Oh. Duh.


BTW, I removed the "Limitations" entry about I/O casting not working
with citext; we fixed that, no?


Yes, we did. Thanks for the catch.

I've got another patch I'm working on adding support for "char" (and  
tests for char). Just to fill out a gap I saw in the casting coverage.  
I'm trying to get it done now. With that, AFAIK, citext will work just  
like text.


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread Tom Lane

"David E. Wheeler" <[EMAIL PROTECTED]> writes:
> Pity. Looks like there were only a few I wasn't using, text_char,  
> char_text, text_name, and texttoxml. Do I really need to keep all my  
> other casts like these in 8.3?

> CREATE OR REPLACE FUNCTION int8(citext)
> RETURNS int8
> AS 'SELECT int8( $1::text )'
> LANGUAGE SQL IMMUTABLE STRICT;

Yeah, those are all replaced by the CoerceViaIO mechanism.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread Tom Lane

"David E. Wheeler" <[EMAIL PROTECTED]> writes:
> Oh, and text_name seems to give me this error:

>   ERROR:  compressed data is corrupt

> That's when I have this cast:

>   CREATE OR REPLACE FUNCTION citext(name)
>   RETURNS citext
>   AS 'text_name'
>   LANGUAGE internal IMMUTABLE STRICT;

I think you've got the direction backwards.


BTW, I removed the "Limitations" entry about I/O casting not working
with citext; we fixed that, no?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread David E. Wheeler


On Sep 12, 2008, at 11:14, David E. Wheeler wrote:

Pity. Looks like there were only a few I wasn't using, text_char,  
char_text, text_name, and texttoxml.


Oh, and text_name seems to give me this error:

 ERROR:  compressed data is corrupt

That's when I have this cast:

 CREATE OR REPLACE FUNCTION citext(name)
 RETURNS citext
 AS 'text_name'
 LANGUAGE internal IMMUTABLE STRICT;

This version does not give me an error:

 CREATE OR REPLACE FUNCTION citext(name)
 RETURNS citext
 AS 'SELECT text( $1 )::citext'
 LANGUAGE SQL IMMUTABLE STRICT;

Maybe I did something wrong?

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread David E. Wheeler



On Sep 12, 2008, at 11:06, David E. Wheeler wrote:


Er, look into pg_cast and then pg_proc?  For instance

select oid::regprocedure, prosrc from pg_proc
where oid in (select castfunc from pg_cast);


That looks like *exactly* what I need. Thanks!


Pity. Looks like there were only a few I wasn't using, text_char,  
char_text, text_name, and texttoxml. Do I really need to keep all my  
other casts like these in 8.3?


CREATE OR REPLACE FUNCTION int8(citext)
RETURNS int8
AS 'SELECT int8( $1::text )'
LANGUAGE SQL IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION citext(int8)
RETURNS citext
AS 'SELECT text( $1 )::citext'
LANGUAGE SQL IMMUTABLE STRICT;

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread David E. Wheeler


On Sep 12, 2008, at 10:58, Tom Lane wrote:


1. Did I neglect to include the documentation patch? I've attached it
here. It's necessary because of the addition of the new functions.


Maybe it got left out of the later patch iterations?  Anyway,
will take care of it.


Great, thank you.


2. Many thanks for switching to using the network_show function
instead of the SQL-based casting I had. Can you tell me how to go
about finding such functions?


Er, look into pg_cast and then pg_proc?  For instance

select oid::regprocedure, prosrc from pg_proc
where oid in (select castfunc from pg_cast);


That looks like *exactly* what I need. Thanks!

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread Tom Lane

"David E. Wheeler" <[EMAIL PROTECTED]> writes:
> 1. Did I neglect to include the documentation patch? I've attached it  
> here. It's necessary because of the addition of the new functions.

Maybe it got left out of the later patch iterations?  Anyway,
will take care of it.

> 2. Many thanks for switching to using the network_show function  
> instead of the SQL-based casting I had. Can you tell me how to go  
> about finding such functions?

Er, look into pg_cast and then pg_proc?  For instance

select oid::regprocedure, prosrc from pg_proc
where oid in (select castfunc from pg_cast);

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] fix dblink security hole

2008-09-12 Thread David Fetter

On Fri, Sep 12, 2008 at 01:14:36PM -0400, Alvaro Herrera wrote:
> Marko Kreen escribió:
> > Currently dblink allows regular users to initiate libpq connection
> > to user-provided connection string.  This breaks the default
> > policy that normal users should not be allowed to freely interact
> > with outside environment.
> 
> Since people is now working on implementing the SQL/MED stuff to
> manage connections,

I don't see any code for this.  Is there some?

> should we bounce this patch?  With luck, the CREATE CONNECTION (?)
> stuff will be done for the next commitfest and we can just switch
> dblink to use that instead.

That would be great :)

> http://archives.postgresql.org/message-id/[EMAIL PROTECTED]
> 
> Thoughts?  Can we really expect SQL/MED connection mgmt to be done
> for the next fest?

Connection management would be awesome.  The whole SQL/MED spec is
gigantic, tho.  Should we see about an implementation roadmap for the
parts we care about?

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] fix dblink security hole

2008-09-12 Thread Alvaro Herrera

Marko Kreen escribió:
> Currently dblink allows regular users to initiate libpq connection
> to user-provided connection string.  This breaks the default
> policy that normal users should not be allowed to freely interact
> with outside environment.

Since people is now working on implementing the SQL/MED stuff to manage
connections, should we bounce this patch?  With luck, the CREATE
CONNECTION (?) stuff will be done for the next commitfest and we can
just switch dblink to use that instead.

http://archives.postgresql.org/message-id/[EMAIL PROTECTED]

Thoughts?  Can we really expect SQL/MED connection mgmt to be done for
the next fest?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-12 Thread David E. Wheeler


On Sep 5, 2008, at 11:33, David E. Wheeler wrote:


On Sep 5, 2008, at 11:30, Tom Lane wrote:


Thanks for reviewing.  I've committed this with your suggestions and
one additional non-cosmetic change: schema-qualify names in the
bodies of the SQL functions so that they are not search_path  
dependent.


Thanks, I'll check that out.


Finally got to this; sorry for the delay.

Two things I noticed:

1. Did I neglect to include the documentation patch? I've attached it  
here. It's necessary because of the addition of the new functions.


2. Many thanks for switching to using the network_show function  
instead of the SQL-based casting I had. Can you tell me how to go  
about finding such functions? Because for my 8.3 version of citext, I  
have a whole bunch of functions that do casting like this:


CREATE OR REPLACE FUNCTION int8(citext)
RETURNS int8
AS 'SELECT int8( $1::text )'
LANGUAGE SQL IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION citext(int8)
RETURNS citext
AS 'SELECT text( $1 )::citext'
LANGUAGE SQL IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION int4(citext)
RETURNS int4
AS 'SELECT int4( $1::text )'
LANGUAGE SQL IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION citext(int4)
RETURNS citext
AS 'SELECT text( $1 )::citext'
LANGUAGE SQL IMMUTABLE STRICT;

...and so on. I'd love to be able to replace these (and many others)  
with internal C functions, if  only I could figure out what those  
functions were. A pointer to making that determination (if they even  
exist in 8.3) would be greatly appreciated.


Thanks,

David



citext_doc.patch
Description: Binary data



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better auth errors from libpq

2008-09-12 Thread Andrew Chernow


David Fetter wrote:

On Fri, Sep 12, 2008 at 12:30:12PM -0400, Alvaro Herrera wrote:

Joshua D. Drake wrote:

Heikki Linnakangas wrote:


The only thing I would say here is that you point the URL to
current  which will be wrong in one release. Perhaps something
that pulls the  pgversion macro?

We don't put URLs in error messages. The hint needs to be a real
sentence.

Which is exactly why its the first thing we do when supporting
people.  Point them to the URL in the docs.

Let's add a new field in error reports: errurl()


Excellent idea :)

Cheers,
David.


If you start using urls in error messages, it might be helpful to point 
everything at a single url and only change the query_string, maybe


http://www.postgresql.org/docs/?err_id

Not sure if err_id is the correct value for the query_string, but I'm 
sure something like the above would simply/centralize management.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Simon Riggs

On Fri, 2008-09-12 at 17:11 +0200, Csaba Nagy wrote:

> Why not have a design where the slave is in control for it's own data ?
> I mean the slave...

The slave only exists because it is a copy of the master. If you try to
"startup" a slave without first having taken a copy, how would you
bootstrap the slave? With what? To what? It sounds cool, but its not
practical.

I posted a workable suggestion today on another subthread.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better auth errors from libpq

2008-09-12 Thread David Fetter

On Fri, Sep 12, 2008 at 12:30:12PM -0400, Alvaro Herrera wrote:
> Joshua D. Drake wrote:
> > Heikki Linnakangas wrote:
> >
> >>> The only thing I would say here is that you point the URL to
> >>> current  which will be wrong in one release. Perhaps something
> >>> that pulls the  pgversion macro?
> >>
> >> We don't put URLs in error messages. The hint needs to be a real
> >> sentence.
> >
> > Which is exactly why its the first thing we do when supporting
> > people.  Point them to the URL in the docs.
> 
> Let's add a new field in error reports: errurl()

Excellent idea :)

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better auth errors from libpq

2008-09-12 Thread Alvaro Herrera

Joshua D. Drake wrote:
> Heikki Linnakangas wrote:
>
>>> The only thing I would say here is that you point the URL to current  
>>> which will be wrong in one release. Perhaps something that pulls the  
>>> pgversion macro?
>>
>> We don't put URLs in error messages. The hint needs to be a real sentence.
>
> Which is exactly why its the first thing we do when supporting people.  
> Point them to the URL in the docs.

Let's add a new field in error reports: errurl()

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better auth errors from libpq

2008-09-12 Thread Joshua D. Drake


Heikki Linnakangas wrote:

The only thing I would say here is that you point the URL to current 
which will be wrong in one release. Perhaps something that pulls the 
pgversion macro?


We don't put URLs in error messages. The hint needs to be a real sentence.



Which is exactly why its the first thing we do when supporting people. 
Point them to the URL in the docs.


Perhaps the middle man isn't needed.

Joshua D. Drake


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Markus Wanner


Hi,

Andrew Dunstan wrote:
If this were a nice pluggable library I'd agree, but AFAIK it's not, and 
I don't see great value in reinventing the wheel.


I certainly agree.

However, I thought of it more like the archive_command, as proposed by 
Hannu. That way we don't need to reinvent any wheel and still the 
standby could trigger the base data synchronization itself.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Heikki Linnakangas


Csaba Nagy wrote:

Why not have a design where the slave is in control for it's own data ?
I mean the slave could ask for the base files (possibly through a
special function deployed on the master), then ask for the WAL stream
and so on. That would easily let a slave cascade too, as it could relay
the WAL stream and serve the base backup too... or have a special WAL
repository software with the same interface as a normal master, but
having a choice of base backups and WAL streams. Plus that a slave in
control approach would also allow multiple slaves at the same time for a
given master...


I totally agree with that.


The only downside of this approach is that the slave machine needs a
full postgres super user connection to the master. That could be a
security problem in certain scenarios. 


I think the master-slave protocol needs to be separate from the normal 
FE/BE protocol, with commands like "send a new base backup", or 
"subscribe to new WAL that's generated". A master-slave connection isn't 
associated with any individual database, for example. We can keep the 
permissions required for establishing a master-slave connection 
different from super-userness. In particular, while the slave will be 
able to read all data from the whole cluster, by receiving it in the WAL 
and base backups, it doesn't need to be able to modify anything in the 
master.



The master-centric scenario needs
a connection in the other direction, which might be seen as more secure,
I don't know for sure...


Which one initiates the connection, the master or slave, is a different 
question. I believe we've all assumed that it's the slave that connects 
to the master, and I think that makes the most sense.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Andrew Dunstan




Heikki Linnakangas wrote:

Simon Riggs wrote:

Built-in? Why? I mean make base backup using rsync. That way only
changed data blocks need be migrated, so much faster.


Yes, what I meant is that it would be cool to have that functionality 
built-in, so that you wouldn't need to configure extra rsync scripts 
and authentication etc.




If this were a nice pluggable library I'd agree, but AFAIK it's not, and 
I don't see great value in reinventing the wheel.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move src/tools/backend/ to wiki

2008-09-12 Thread Alvaro Herrera

Peter Eisentraut wrote:
> Alvaro Herrera wrote:
>> However I wonder how much value there really is in the developer's FAQ,
>> considering that some answers seem rather poor.  For example the
>> answer on ereport() was wrong, and nobody ever pointed it out.  The
>> answer on palloc/pfree is very incomplete too.
>
> I think the developer's FAQ has essentially been unmaintained for many  
> years.  I think we should gradually migrate the content to other wiki  
> pages and eventually drop the FAQ.

In a way, this has already started; some answers are now just
pointers to other wiki pages or to the docs.

I think getting rid of the FAQ completely is not necessarily a good
idea; it seems useful as a collection of interesting questions.  Moving
the contents to new pages is probably OK.  Also, as the answers mature
on the Wiki, perhaps it'd be possible to move them to the SGML docs (and
reduce the Wiki answer to just a pointer).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Heikki Linnakangas


Simon Riggs wrote:

Built-in? Why? I mean make base backup using rsync. That way only
changed data blocks need be migrated, so much faster.


Yes, what I meant is that it would be cool to have that functionality 
built-in, so that you wouldn't need to configure extra rsync scripts and 
authentication etc.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move src/tools/backend/ to wiki

2008-09-12 Thread David Fetter

On Fri, Sep 12, 2008 at 10:29:54AM +0300, Peter Eisentraut wrote:
> Alvaro Herrera wrote:
>> However I wonder how much value there really is in the developer's
>> FAQ, considering that some answers seem rather poor.  For example
>> the answer on ereport() was wrong, and nobody ever pointed it out.
>> The answer on palloc/pfree is very incomplete too.
>
> I think the developer's FAQ has essentially been unmaintained for
> many  years.  I think we should gradually migrate the content to
> other wiki  pages and eventually drop the FAQ.

+1 :)

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New FSM patch

2008-09-12 Thread Heikki Linnakangas


Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:

Let me describe this test case first:
- The test program calls RecordAndGetPageWithFreeSpace in a tight loop, 
with random values.


What's the distribution of the random values, exactly?  In particular,
how do the request sizes compare to available free space per-page?


The request, and "old avail" sizes are in the range of 0-8100 
(random()%8100).



The design intent for FSM was that we'd not bother to record pages that
have less free space than the average request size, so as to (usually)
avoid the problem of uselessly searching a lot of entries.  I can't tell
whether your test case models that behavior at all.  If it does then
there may be something else that needs fixing.


Probably not. The test case starts with a table that's practically 
empty, so all pages are put into the FSM.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New FSM patch

2008-09-12 Thread Heikki Linnakangas


Zdenek Kotala wrote:
It looks likes that there are lot of lock issues on FSM pages. When 
number of FSM pages is increased then number of collisions is lower. It 
is probably why 2 clients significantly speed up between 33MB and 333MB. 


Yes, that's what I thought as well. With table size under 33 MB, the FSM 
consists of just one (bottom-level) FSM page,



I think it is time to take DTrace ;-).
Do you have any machine with DTrace support? 


No.

If not send me your test 
suit and I will try it run on my machine.


Sure, here you are. tests.sh is the main script to run. You'll need to 
adjusts the paths there for your environment.


As it is, the tests will take many hours to run, so you'll probably want 
to modify tests.sh and pgbenchtests.sh to reduce the number of 
iterations. At least on my server, the variance in the numbers was very 
small, so repeating the tests 4 times in tests.sh is probably overkill.


Thanks!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


fsmtest.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Csaba Nagy

On Fri, 2008-09-12 at 17:24 +0300, Hannu Krosing wrote:
> On Fri, 2008-09-12 at 17:08 +0300, Heikki Linnakangas wrote:
> > Hmm, built-in rsync capability would be cool. Probably not in the first 
> > phase, though..
> 
> We have it for WAL shipping, in form of GUC "archive_command"  :)
> 
> Why not add full_backup_command ?

I see the current design is all master-push centered, i.e. the master is
in control of everything WAL related. That makes it hard to create a
slave which is simply pointed to the server and takes all it's data from
there...

Why not have a design where the slave is in control for it's own data ?
I mean the slave could ask for the base files (possibly through a
special function deployed on the master), then ask for the WAL stream
and so on. That would easily let a slave cascade too, as it could relay
the WAL stream and serve the base backup too... or have a special WAL
repository software with the same interface as a normal master, but
having a choice of base backups and WAL streams. Plus that a slave in
control approach would also allow multiple slaves at the same time for a
given master...

The way it would work would be something like:

* configure the slave with a postgres connection to the master;
* the slave will connect and set up some meta data on the master
identifying itself and telling the master to keep the WAL needed by this
slave, and also get some meta data about the master's details if needed;
* the slave will call a special function on the slave and ask for the
base backup to be streamed (potentially compressed with special
knowledge of postgres internals);
* once the base backup is streamed, or possibly in parallel,  ask for
streaming the WAL files;
* when the base backup is finished, start applying the WAL stream, which
is cached in the meantime, and it it's streaming continues;
* keep the master updated about the state of the slave, so the master
can know if it needs to keep the WAL files which were not yet streamed;
* in case of network error, the slave connects again and starts to
stream the WAL from where it was left;
* in case of extended network outage, the master could decide to
unsubscribe the slave when a certain time-out happened;
* when the slave finds itself unsubscribed after a longer disconnection,
it could ask for a new base backup based on differences only... some
kind of built in rsync thingy;

The only downside of this approach is that the slave machine needs a
full postgres super user connection to the master. That could be a
security problem in certain scenarios. The master-centric scenario needs
a connection in the other direction, which might be seen as more secure,
I don't know for sure...

Cheers,
Csaba.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Simon Riggs


On Fri, 2008-09-12 at 17:08 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:

> I think we'll need an option to specify a maximum for the number of WAL 
> files to keep around. The DBA should set that to the size of the WAL 
> drive, minus some safety factor.
> 
> > It should be clear that to make this work you must run with a base
> > backup that was derived correctly on the current master. You can do that
> > by re-copying everything, or you can do that by just shipping changed
> > blocks (rsync etc). So I don't see a problem in the first place.
> 
> Hmm, built-in rsync capability would be cool. Probably not in the first 
> phase, though..

Built-in? Why? I mean make base backup using rsync. That way only
changed data blocks need be migrated, so much faster.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Simon Riggs

On Fri, 2008-09-12 at 15:08 +0300, Hannu Krosing wrote:
> On Fri, 2008-09-12 at 13:54 +0200, Csaba Nagy wrote:
> > > I think that enabling long-running queries this way is both
> > > low-hanging
> > > fruit (or at least medium-height-hanging ;) ) and also consistent to
> > > PostgreSQL philosophy of not replication effort. As an example we trust
> > > OS's file system cache and don't try to write our own.
> > 
> > I have again questions (unfortunately I only have questions usually):
> > 
> > * how will the buffers keep 2 different versions of the same page ?
> 
> As the FS snapshot is mounted as a different directory, it will have
> it's own buffer pages.

RelFileNode has a spcNode which can be redirected to a temporary
filesystem snapshot. So its relatively easy to imagine redirecting
access to a table from its normal tablespace to the snapshot one.

> To conserve RAM, one could go to FS snapshot files only in case main
> pages have LSN too big to be trusted.

That would mean you'd need to do two I/Os, one to get the newly changed
page to get its LSN and another to get the old COW copy. We might waste
buffer space with that technique also. Since we'd be trying to avoid
cacheing bigger tables anyway (since 8.3) it seems easier to just go
straight to the COW copy.

So I think its fairly straightforward to support temporary snapshots in
Postgres, with creation/destruction handled in the way you say.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_regress inputdir

2008-09-12 Thread Alvaro Herrera

Tom Lane wrote:
> Peter Eisentraut <[EMAIL PROTECTED]> writes:

> > I think the appropriate interface would be adding another option to 
> > pg_regress called --workdir or --tempdir, which defaults to PWD, and 
> > write the converted sql files there, and then look for the sql files to 
> > execute in workdir/sql and in inputdir/sql.  In some way, this copies 
> > the vpath search mechanism.
> 
> That would be required to make pg_regress run as far as its own
> facilities are concerned.  But I think Alvaro is worried about something
> at a higher level: the regression test process as a whole has some
> directory layout assumptions built into it, particularly in regards
> to where to find .so's.  If we don't have a workable solution for that
> it's not really going to help to change pg_regress like this.

Maybe the same work dir can be used as a place to store the shared
objects.  I think all it'd require is to change @abs_builddir@ to point
to workdir.

That should work fine as long as nobody attempts to put the workdir in
some mount point that's marked noexec (which is somewhat common with
/tmp)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies

2008-09-12 Thread D'Arcy J.M. Cain

On Fri, 12 Sep 2008 06:53:55 +1000
"Brendan Jurd" <[EMAIL PROTECTED]> wrote:
> Josh has assigned your patch to me for an initial review.

And me.

> First up I'd like to say that this is a really nice upgrade.
> Shielding a running server from reloading a bogus conf file makes a
> whole lot of sense.

Yes.

> The patch applied cleanly to HEAD, compiled fine on amd64 gentoo and

I had a small problem compiling.  I'm not sure why it would be
different for me.  I run NetBSD -current.  Here is the error:

../../../src/include/libpq/hba.h:51: error: field 'addr' has incomplete
type

I was able to fix this by adding the following line to hba.h:

#include "libpq/pqcomm.h"/* needed for struct sockaddr_storage */

-- 
D'Arcy J.M. Cain <[EMAIL PROTECTED]> |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Hannu Krosing

On Fri, 2008-09-12 at 17:08 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> 
> > It should be clear that to make this work you must run with a base
> > backup that was derived correctly on the current master. You can do that
> > by re-copying everything, or you can do that by just shipping changed
> > blocks (rsync etc). So I don't see a problem in the first place.
> 
> Hmm, built-in rsync capability would be cool. Probably not in the first 
> phase, though..

We have it for WAL shipping, in form of GUC "archive_command"  :)

Why not add full_backup_command ?

--
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Heikki Linnakangas


Simon Riggs wrote:

If we were going to recover from failed-over standby back to original
master just via WAL logs we would need all of the WAL files from the
point of failover. So you'd need to be storing all WAL file just in case
the old master recovers. I can't believe doing that would be the common
case, because its so impractical and most people would run out of disk
space and need to delete WAL files.


Depends on the transaction volume and database size of course. It's 
actually not any different from the scenario where the slave goes 
offline for some reason. You have the the same decision there of how 
long to keep the WAL files in the master, in case the slave wakes up.


I think we'll need an option to specify a maximum for the number of WAL 
files to keep around. The DBA should set that to the size of the WAL 
drive, minus some safety factor.



It should be clear that to make this work you must run with a base
backup that was derived correctly on the current master. You can do that
by re-copying everything, or you can do that by just shipping changed
blocks (rsync etc). So I don't see a problem in the first place.


Hmm, built-in rsync capability would be cool. Probably not in the first 
phase, though..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New FSM patch

2008-09-12 Thread Tom Lane

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Let me describe this test case first:
> - The test program calls RecordAndGetPageWithFreeSpace in a tight loop, 
> with random values.

What's the distribution of the random values, exactly?  In particular,
how do the request sizes compare to available free space per-page?

The design intent for FSM was that we'd not bother to record pages that
have less free space than the average request size, so as to (usually)
avoid the problem of uselessly searching a lot of entries.  I can't tell
whether your test case models that behavior at all.  If it does then
there may be something else that needs fixing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better auth errors from libpq

2008-09-12 Thread David Fetter

On Fri, Sep 12, 2008 at 08:53:39AM +0100, Gregory Stark wrote:
> 
> Tom Lane <[EMAIL PROTECTED]> writes:
> 
> > Joshua Drake <[EMAIL PROTECTED]> writes:
> >> I think something like:
> >
> >> psql: FATAL:  Ident authentication failed for user "root"
> >> HINT: http://www.postgresql.org/docs/8.3/static/client-authentication.html
> >
> >> Would be nice.
> ...
> >
> > Or to put it even more baldly: this is not an area in which you
> > can improve matters significantly with five minutes' thought and a
> > one-line patch.  It would take some actual work.
> 
> Actually I think there is a problem with the original message that
> could be improved. The problem is that "Ident" is a
> Postgres-specific term that a newbie DBA is unlikely to understand.
> What's worse it's an ambiguous term that is easily misunderstood to
> refer to the rfc1413 ident protocol which Postgres might or might
> not be using.

For "Ident" auth, we need to split that functionality into two
separate pieces: socket and network, and add descriptive error
messages for each.

> I would suggest instead describing it using more generic terminology
> though offhand I'm not sure what that would be.  A detail line could
> include the Postgres-specific authentication method which failed.

Excellent idea :)

> I do think it's true that the pg_hba setup is far more complex than
> it has to be and that that's a bigger problem than a simple error
> message too.

Agreed.  Any ideas as to how we might address this?

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_regress inputdir

2008-09-12 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Alvaro Herrera wrote:
>> I tried to run the test from another directory with this patch
>> installed, and found that it didn't work because it's replacing
>> @abs_builddir@ in the input files improperly (to the current path; it
>> should be using the output dir path, I think)

> I think the appropriate interface would be adding another option to 
> pg_regress called --workdir or --tempdir, which defaults to PWD, and 
> write the converted sql files there, and then look for the sql files to 
> execute in workdir/sql and in inputdir/sql.  In some way, this copies 
> the vpath search mechanism.

That would be required to make pg_regress run as far as its own
facilities are concerned.  But I think Alvaro is worried about something
at a higher level: the regression test process as a whole has some
directory layout assumptions built into it, particularly in regards
to where to find .so's.  If we don't have a workable solution for that
it's not really going to help to change pg_regress like this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better auth errors from libpq

2008-09-12 Thread David Fetter

On Fri, Sep 12, 2008 at 10:08:56AM +0200, Markus Schiltknecht wrote:
> Hi,
>
> David Fetter wrote:
>> I'm all for something, and that's a much better something.  What we
>> have now--nothing--actively distresses newbies for no good reason.
>>
>> I don't know how many people we've lost right at that point, but
>> the number has to be high, as most people don't just hop into IRC
>> with their problem.
>
> Maybe something much more specific, i.e. triggering only if one
> tried to connect via localhost or unix sockets, and only if one
> tried to authenticate as 'root' without a password.

It's not the root part that confuses people, but the entire message.

> The hint shoud IMO say something like: "The default superuser is
> postgres, not root". Something that's useful for this specific case
> and  doesn't disturb in others. And something that's public
> knowledge, which  any reasonably serious attacker already knows
> anyway.

I, too, disagree with the "security by obscurity" approach to auth
error messages.  A system cracker will not be deterred by any such a
thing, but a new user can easily be.

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy

On Fri, 2008-09-12 at 15:08 +0300, Hannu Krosing wrote:
> > * how will the buffers keep 2 different versions of the same page ?
> 
> As the FS snapshot is mounted as a different directory, it will have
> it's own buffer pages.

Lack of knowledge about this shows my ignorance about the implementation
of the page buffers...

> > * how will you handle the creation of snapshots ? 
> 
> probably an external command, possibly shell script. 
> similar to current "archive_command" for wal copying
> 
> maybe 'create_fs_snapshot_command' and 'destroy_fs_snapshot_command'
[snip]
> Yes, the simplest one being external command. As FS snapshots are
> supposed to happen not-too often, at least not every second, just having
> external commands may be enough.

You could restrict the creation of snapshots to some minimum amount of
time between them, and maybe also restrict the maximum number of
concurrent snapshots possible. Then if the time limit (as calculated
from the last open snapshot) is currently not met, any new query could
reuse that last snapshot. The time intervals do not need to be evenly
distributed BTW, it could be a function of the already opened snapshots,
like increase the minimum interval exponentially with the number of
already opened snapshots. That would help to catch more long running
queries to just a few snapshots.

> > I hope my continuous questioning is not too annoying...
> 
> On the contrary, much appreciated. :)

Ok, then I'll continue :-) I would like to see this feature succeed, but
there's slim chance I'll ever code well in C...

Cheers,
Csaba.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New FSM patch

2008-09-12 Thread Zdenek Kotala


Heikki Linnakangas napsal(a):

Heikki Linnakangas wrote:
I've also been working on a low level benchmark using a C user-defined 
function that exercises just the FSM, showing the very raw CPU 
performance vs. current implementation. More on that later, but ATM it 
looks like the new implementation can be faster or slower than the 
current one, depending on the table size.


Let me describe this test case first:
- The test program calls RecordAndGetPageWithFreeSpace in a tight loop, 
with random values. There's no activity to the heap. In normal usage, 
the time spent in RecordAndGetWithFreeSpace is minuscule compared to the 
heap and index updates that cause RecordAndGetWithFreeSpace to be called.
- WAL was placed on a RAM drive. This is of course not how people set up 
their database servers, but the point of this test was to measure CPU 
speed and scalability. The impact of writing extra WAL is significant 
and needs to be taken into account, but that's a separate test and 
discussion, and needs to be considered in comparison to the WAL written 
by heap and index updates.


That said, the test results are pretty interesting.

I ran the test using a custom scripts with pgbench. I ran it with 
different table sizes, and with 1 or 2 clients, on CVS HEAD and a 
patched version. The unit is "thousands of RecordAndGetPageWithFreeSpace 
calls per second":


Table sizePatchedCVS HEAD
1 clnt2 clnts1 clnt2 clients
8 kB4.593.4562.8326.85
336 kB13.856.4341.816.55
3336 kB14.966.322.4510.55
6 kB14.856.565.444.08
36 kB14.4811.040.790.74
336 kB12.6811.50.070.07
3336 kB7.675.370.050.05

The big surprise to me was that performance on CVS HEAD tanks as the 
table size increases. One possible explanation is that searches for X 
bytes of free space, for a very high X, will not find any matches, and 
the current FSM implementation ends up scanning through the whole FSM 
list for that relation.


Another surprise was how badly both implementations scale. On CVS HEAD, 
I expected the performance to be roughly the same with 1 and 2 clients, 
because all access to the FSM is serialized on the FreeSpaceLock. But 
adding the 2nd client not only didn't help, but it actually made the 
performance much worse than with a single client. Context switching or 
cache line contention, perhaps? 


The new FSM implementation shows the 
same effect, which was an even bigger surprise. At table sizes > 32 MB, 
the FSM no longer fits on a single FSM page, so I expected almost a 
linear speed up with bigger table sizes from using multiple clients. 
That's not happening, and I don't know why. Although, going from 33MB to 
333 MB, the performance with 2 clients almost doubles, but it still 
doesn't exceed that with 1 client.


It looks likes that there are lot of lock issues on FSM pages. When number of 
FSM pages is increased then number of collisions is lower. It is probably why 2 
clients significantly speed up between 33MB and 333MB. I think it is time to 
take DTrace ;-).
Do you have any machine with DTrace support? If not send me your test suit and I 
will try it run on my machine.


Zdenek





--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Hannu Krosing

On Fri, 2008-09-12 at 13:54 +0200, Csaba Nagy wrote:
> > I think that enabling long-running queries this way is both
> > low-hanging
> > fruit (or at least medium-height-hanging ;) ) and also consistent to
> > PostgreSQL philosophy of not replication effort. As an example we trust
> > OS's file system cache and don't try to write our own.
> 
> I have again questions (unfortunately I only have questions usually):
> 
> * how will the buffers keep 2 different versions of the same page ?

As the FS snapshot is mounted as a different directory, it will have
it's own buffer pages.

To conserve RAM, one could go to FS snapshot files only in case main
pages have LSN too big to be trusted.

> * how will you handle the creation of snapshots ? 

probably an external command, possibly shell script. 
similar to current "archive_command" for wal copying

maybe 'create_fs_snapshot_command' and 'destroy_fs_snapshot_command'

> I guess there's no portable and universal API for that (just guessing), 
> or there is some POSIX thing which is supported or not by the specific FS ? 
> So if the FS is not supporting it, you skip the snapshot step ? 

Yes, if not FS snapshots are not supported, we fall back to either
inconsistent read or killing long-running queries.

> And if there's no universal API, will it be handled by plugins providing 
> a specified API for snapshotting the FS ?

Yes, the simplest one being external command. As FS snapshots are
supposed to happen not-too often, at least not every second, just having
external commands may be enough.

> I hope my continuous questioning is not too annoying...

On the contrary, much appreciated. :)

Hannu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Simon Riggs


On Fri, 2008-09-12 at 12:25 +0100, Gregory Stark wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> 
> > 3. Ignore problem
> > Effects:
> > * Long running queries on standby...
> >Have no effect on primary
> >Do not delay apply of WAL records on standby
> > * Queries on standby give inconsistent answers in some cases, though
> > doesn't generate any messages to show inconsistency occurred. Acceptable
> > for read-only and insert only tables only.
> 
> This seems like a non-starter. 

It works, and is proposed as a non-default option since a number of
people have independently said to me that this would be
acceptable/preferred.

> Your comment about read-only and insert-only tuples only seems to make sense
> if you assume there are other tables being updated simultaneously. Otherwise
> of course there would be no WAL records for tuple removals.

Yeh, you got it.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy

> I think that enabling long-running queries this way is both
> low-hanging
> fruit (or at least medium-height-hanging ;) ) and also consistent to
> PostgreSQL philosophy of not replication effort. As an example we trust
> OS's file system cache and don't try to write our own.

I have again questions (unfortunately I only have questions usually):

* how will the buffers keep 2 different versions of the same page ?
* how will you handle the creation of snapshots ? I guess there's no portable 
and universal API for that (just guessing), or there is some POSIX thing which 
is supported or not by the specific FS ? So if the FS is not supporting it, you 
skip the snapshot step ? And if there's no universal API, will it be handled by 
plugins providing a specified API for snapshotting the FS ?

I hope my continuous questioning is not too annoying...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NDirectFileRead and Write

2008-09-12 Thread Tom Lane

ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> BTW, we have the word 'Direct' in ShowBufferUsage().
> Can I keep it as-is?

> ShowBufferUsage()
> !   Shared blocks: ...
> !   Local  blocks: ...
> !   Direct blocks:  read,  written

Good point.  Seems like it should be changed, but I am not sure to what.
Ideas anyone?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy

On Fri, 2008-09-12 at 12:31 +0100, Richard Huxton wrote:
> There was a suggestion (Simon - from you?) of a transaction voluntarily
> restricting itself to a set of tables.

While thinking about how easy it would be for the DBA to specify the set
of tables a single query is accessing, first I thought that it should be
straight enough to look at the query itself for that. Then I thought
what about views, rules, triggers, user functions etc. ? All those have
the potential to access more than you see in the query itself. And then
the actually interesting question: what will the slave do with views,
rules, triggers ? I guess triggers are out of the question to be
executed, what about rules ? Probably must be also ignored... user
functions will probably get errors if they try to update something...
Views should probably function correctly.

So in any case the functionality available for querying slaves would be
less than for the primary. This is probably good enough for most
purposes...

Cheers,
Csaba.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Hannu Krosing

On Fri, 2008-09-12 at 12:31 +0300, Hannu Krosing wrote:
> On Fri, 2008-09-12 at 09:45 +0100, Simon Riggs wrote:
> > On Thu, 2008-09-11 at 15:42 +0300, Heikki Linnakangas wrote:
> > > Gregory Stark wrote:
> > > > b) vacuum on the server which cleans up a tuple the slave has in scope 
> > > > has to
> > > >block WAL reply on the slave (which I suppose defeats the purpose of 
> > > > having
> > > >a live standby for users concerned more with fail-over latency).
> > > 
> > > One problem with this, BTW, is that if there's a continuous stream of 
> > > medium-length transaction in the slave, each new snapshot taken will 
> > > prevent progress in the WAL replay, so the WAL replay will advance in 
> > > "baby steps", and can fall behind indefinitely. As soon as there's a 
> > > moment that there's no active snapshot, it can catch up, but if the 
> > > slave is seriously busy, that might never happen.
> > 
> > It should be possible to do mixed mode.
> > 
> > Stall WAL apply for up to X seconds, then cancel queries. Some people
> > may want X=0 or low, others might find X = very high acceptable (Merlin
> > et al).
> 
> Or even milder version.
> 
> * Stall WAL apply for up to X seconds, 
> * then stall new queries, let old ones run to completion (with optional
> fallback to canceling after Y sec), 
> * apply WAL. 
> * Repeat.

Now that I have thought a little more about delegating keeping old
versions to filesystem level (ZFS , XFS+LVM) snapshots I'd like to
propose the following:

 0. run queries and apply WAL freely until WAL application would 
remove old rows.

 1. stall applying WAL for up to N seconds

 2. stall starting new queries for up to M seconds

 3.  if some backends are still running long queries, then 

   3.1. make filesystem level snapshot (FS snapshot), 
   3.2. mount the FS snapshot somewhere (maybe as data.at.OldestXmin 
in parallel to $PGDATA) and 
   3.3 hand this mounted FS snapshot over to those backends

 4. apply WAL

 5. GoTo 0.

Of course we need to do the filesystem level snapshots in 3. only if the
long-running queries don't already have one given to them. Or maybe also
if they are running in READ COMMITTED mode and and have aquired a new PG
snapshot since they got their FS snapshot need a new one.

Also, snapshots need to be reference counted, so we can unmount and
destroy them once all their users have finished.

I think that enabling long-running queries this way is both low-hanging
fruit (or at least medium-height-hanging ;) ) and also consistent to
PostgreSQL philosophy of not replication effort. As an example we trust
OS's file system cache and don't try to write our own.

Hannu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Richard Huxton

Gregory Stark wrote:
> In that case the problem is dealing with different usage patterns on different
> tables. There might be a way to solve just that use case such as deferring WAL
> records for those tables. That doesn't guarantee inter-table data consistency
> if there were other queries which read from those tables and updated other
> tables based on that data though. Perhaps there's a solution for that too
> though.

There was a suggestion (Simon - from you?) of a transaction voluntarily
restricting itself to a set of tables. That would obviously reduce the
impact of all the options where the accessed tables weren't being
updated (where update = vacuum + HOT if I've got this straight).

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Gregory Stark

Simon Riggs <[EMAIL PROTECTED]> writes:

> 3. Ignore problem
> Effects:
> * Long running queries on standby...
>Have no effect on primary
>Do not delay apply of WAL records on standby
> * Queries on standby give inconsistent answers in some cases, though
> doesn't generate any messages to show inconsistency occurred. Acceptable
> for read-only and insert only tables only.

This seems like a non-starter. 

Your comment about read-only and insert-only tuples only seems to make sense
if you assume there are other tables being updated simultaneously. Otherwise
of course there would be no WAL records for tuple removals.

In that case the problem is dealing with different usage patterns on different
tables. There might be a way to solve just that use case such as deferring WAL
records for those tables. That doesn't guarantee inter-table data consistency
if there were other queries which read from those tables and updated other
tables based on that data though. Perhaps there's a solution for that too
though.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Hannu Krosing

On Fri, 2008-09-12 at 13:53 +0300, Hannu Krosing wrote:

> 4. Slave keeps copies of removed pages or rows when WAL apply removes
> old versions .
> 
> Possible ways to do this
> 
>  * inside Slave - have some backup store tied to OldestXmin intervals
> 
>   * variant 1 - have one global store, accessed through shared mem
>   * variant 2 - present removed pages to interested backends and 
> let them (decide to) keep them 
> 
>  * outside Slave - having file system keep old snapshots as long as
> needed, still must tie to OldestXmin intervals, but most of work done by
> storage layer (SAN or overlay file system).

Possible options for "outside Slave" filesystem snapshooting - 

ZFS ( http://en.wikipedia.org/wiki/ZFS ) is very likely usable 

Linux LVM + XFS may be usable -
http://arstechnica.com/articles/columns/linux/linux-20041013.ars

Possibly also http://en.wikipedia.org/wiki/Btrfs .


Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Simon Riggs

On Fri, 2008-09-12 at 00:03 +0900, Fujii Masao wrote:

> In my procedure, old WAL files are copyed by admin using scp, rsync
> or other external tool. So, I don't think that my procedure makes a
> problem more difficult. Since there are many setup cases, we should
> not leave all procedures to postgres, I think.

So the procedure is

1. Startup WALReceiver to begin receiving WAL
2. Do some manual stuff
3. Initiate recovery

So either

* WALReceiver is not started by postmaster. 
I don't think its acceptable that WALReceiver is not under the
postmaster. You haven't reduced the number of failure modes by doing
that, you've just swept the problem under the carpet and pretended its
not Postgres' problem.

* Postgres startup requires some form of manual process, as an
**intermediate** stage.

I don't think either of those is acceptable. It must just work.

Why not:
1. Same procedure as Warm Standby now
a) WAL archiving to standby starts
b) base backup

2. Startup standby, with additional option to stream WAL. WALReceiver
starts, connects to Primary. Primary issues log switch. Archiver turns
itself off after sending that last file. WALSender starts streaming
current WAL immediately after log switch.

3. Startup process on standby begins reading WAL from point mentioned by
backup_label. When it gets to last logfile shipped by primary's
archiver, it switches to reading WAL files written by WALReceiver.

So all automatic. Uses existing code. Synchronous replication starts
immediately. Also has the advantage that we do not get WAL bloat on
primary. Configuration is almost identical to current Warm Standby, so
little change for existing Postgres sysadmins.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Hannu Krosing

On Fri, 2008-09-12 at 11:19 +0100, Simon Riggs wrote:
> On Thu, 2008-09-11 at 01:07 +0100, Simon Riggs wrote:
> > Transaction snapshots is probably the most difficult problem for Hot
> > Standby to resolve. 
> 
> In summary of thread so far:
> 
> When queries on standby run for significantly longer than longest
> queries on primary, some problems can occur. Various people have argued
> for these responses to the problems:
> 
> 1. Master uses Standby's OldestXmin
...
> 2. Master ignores Standby's OldestXmin
...
> 3. Ignore problem
...

> Summary OK for everyone?

Maybe we should at least mention option 4.

4. Slave keeps copies of removed pages or rows when WAL apply removes
old versions .

Possible ways to do this

 * inside Slave - have some backup store tied to OldestXmin intervals

  * variant 1 - have one global store, accessed through shared mem
  * variant 2 - present removed pages to interested backends and 
let them (decide to) keep them 

 * outside Slave - having file system keep old snapshots as long as
needed, still must tie to OldestXmin intervals, but most of work done by
storage layer (SAN or overlay file system).

-
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Simon Riggs


On Thu, 2008-09-11 at 12:18 +0100, Gregory Stark wrote:

> Is there any middle ground or brilliant ways to get the best of both worlds?

Possibly. Nobody has commented yet on the other ideas on the post
itself.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Simon Riggs

On Thu, 2008-09-11 at 18:17 +0300, Heikki Linnakangas wrote:
> Fujii Masao wrote:
> > I think that this case would often happen. So, we should establish a certain
> > solution or procedure to the case where TLI of the master doesn't match
> > TLI of the slave. If we only allow the case where TLI of both servers is the
> > same, the configuration after failover always needs to get the base backup
> > on the new master. It's unacceptable for many users. But, I think that it's
> > the role of admin or external tools to copy history files to the slave from
> > the master.
> 
> Hmm. There's more problems than the TLI with that. For the original 
> master to catch up by replaying WAL from the new slave, without 
> restoring from a full backup, the original master must not write to disk 
> *any* WAL that hasn't made it to the slave yet. That is certainly not 
> true for asynchronous replication, but it also throws off the idea of 
> flushing the WAL concurrently to the local disk and to the slave in 
> synchronous mode.
> 
> I agree that having to get a new base backup to get the old master catch 
> up with the new master sucks, so I hope someone sees a way around that.

If we were going to recover from failed-over standby back to original
master just via WAL logs we would need all of the WAL files from the
point of failover. So you'd need to be storing all WAL file just in case
the old master recovers. I can't believe doing that would be the common
case, because its so impractical and most people would run out of disk
space and need to delete WAL files.

It should be clear that to make this work you must run with a base
backup that was derived correctly on the current master. You can do that
by re-copying everything, or you can do that by just shipping changed
blocks (rsync etc). So I don't see a problem in the first place.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NDirectFileRead and Write

2008-09-12 Thread ITAGAKI Takahiro


Tom Lane <[EMAIL PROTECTED]> wrote:

> * Let's rename them BufFileReadCount and BufFileWriteCount to reflect
> their actual purpose.

It looks good. No one uses NDirectFileRead/Write variables now,
so we can rename it freely.

BTW, we have the word 'Direct' in ShowBufferUsage().
Can I keep it as-is?

ShowBufferUsage()
!   Shared blocks: ...
!   Local  blocks: ...
!   Direct blocks:  read,  written


> * In any case I agree that the current arrangement
> with execdebug.h declaring variables defined in bufmgr.c is just weird.

Ok.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Simon Riggs

On Thu, 2008-09-11 at 01:07 +0100, Simon Riggs wrote:
> Transaction snapshots is probably the most difficult problem for Hot
> Standby to resolve. 

In summary of thread so far:

When queries on standby run for significantly longer than longest
queries on primary, some problems can occur. Various people have argued
for these responses to the problems:

1. Master uses Standby's OldestXmin
Effects: 
* Long running queries on standby...
   Can delay row removal on primary
   Do not delay apply of WAL records on standby
* Queries on standby give consistent answers in all cases.

2. Master ignores Standby's OldestXmin
Effects:
* Long running queries on standby...
   Have no effect on primary
   Can delay apply of WAL records on standby
* Queries on standby give consistent answers in all cases.

3. Ignore problem
Effects:
* Long running queries on standby...
   Have no effect on primary
   Do not delay apply of WAL records on standby
* Queries on standby give inconsistent answers in some cases, though
doesn't generate any messages to show inconsistency occurred. Acceptable
for read-only and insert only tables only.

Hot Standby should provide all 3 responses as options.

(1) would be implemented by sending Standby OldestXmin to primary.
Snapshots would not be sent from primary, they will be derived locally
from transactions currently being applied.

(2) would be implemented by setting a timer. When Startup process has
waited for more than "redo_max_delay"/"max_lag_delay" (SIGHUP) we cancel
queries. If timeout is 0 we aggressively cancel queries without a
timeout.

(3) would be implemented using read_consistency = on (default) | off, a
USERSET parameter. When read_consistency = off we ignore the backend's
xmin when deciding whether to wait before applying WAL or not.

Summary OK for everyone?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Simon Riggs


On Thu, 2008-09-11 at 17:04 +0300, Heikki Linnakangas wrote:
> Csaba Nagy wrote:
> > and that means in fact that if you have
> > continuously overlapping small transactions, the "blocking horizon"
> > could be even blocked forever, as there'll always be a query running,
> > and the new queries will always have the snapshot of the currently
> > running ones because WAL recovery is stalled... 
> 
> Hmm, no I don't think the WAL recovery can become completely stalled. To 
> completely stop progressing, we'd need to take a new snapshot that 
> includes transaction X, and at the same time be blocked on a vacuum 
> record that vacuums a tuple that's visible to transaction X. I don't 
> think that can happen, because for such a scenario to arise, in the 
> corresponding point in time in the master, there would've been a 
> scenario where the vacuum would've removed a tuple that would have been 
>   visible to a newly starting transaction. Which can't happen. I think..

ISTM Csaba is correct. If WAL blocks the xids don't change and so the
snapshots never change, so wal is blocked. 

The only way out of that is to store up removals for particular blocks,
but that's complicated.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Simon Riggs


On Thu, 2008-09-11 at 15:17 +0200, Jochem van Dieten wrote:
> On Thu, Sep 11, 2008 at 2:07 AM, Simon Riggs wrote:
> > Transaction snapshots is probably the most difficult problem for Hot
> > Standby to resolve.
> 
> >  * remotely from primary node
> >  * locally on the standby node
> >
> > If we derive a snapshot locally, then we will end up with a situation
> > where the xmin of the local snapshot precedes the xmin of the primary
> > node. When this occurs it will then be possible for WAL records to
> > arrive on the standby that request removal of rows that a transaction
> > might wish to see. Preventing that situation can be done by either
> > deferring WAL apply or by cancelling queries.
> 
> Which operations can request row removal? Isn't that just specific
> operations that have their own 'is this save to remove' calculations
> anyway (i.e. vacuum and HOT prune)?
> 
> What I am thinking about is a design where the primary node were to
> regularly push an OldestXMin into the WAL, the WAL apply process on
> the standby nodes pushes it into shared memory and the backends keep
> an OldestMasterXMin in shared memory. The standby nodes then regularly
> pushes back the oldest OldestMasterXMin from all backends to the
> master. Vacuum and HOT prune could then base their calculations on an
> OldestXMin that is not the OldestXMin of the master itself, but of the
> master and the standby nodes. That way removal of records that are
> still visible on one of the standby nodes is prevented on the master
> instead of worked around on the standby nodes.
> 
> The obvious downside would be bloat on the master (which could get out
> of hand if a slave is a few days behind due to a long report), but I
> think in terms of visibility and consistency this would work. Or am I
> completely misunderstanding the problem?

Yes, just sending the xmin from standby to primary is the best way of
having primary and standby work together. I will include this, thanks.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest patches mostly assigned ... status

2008-09-12 Thread KaiGai Kohei


Josh Berkus wrote:

Hackers,

At this point, almost all patches have been assigned to reviewers.  If 
you submitted a patch and don't get feedback by Saturday, take a look at 
who's in the "reviewers" column and send them a query.  Since I have no 
way to track when patches are assigned to reviewers, I have no idea if 
some of them are sitting on their hands.

   :
Note that patches need not have only one reviewer!  If you have time, 
please take on testing some of the more complex patches, especially:

Column-level Permissions
Common Table Expressions
SE-PostgreSQL patches


I updated my patches:
http://archives.postgresql.org/pgsql-hackers/2008-09/msg00859.php

It contains rebasing to the latest CVS HEAD, a new hook to avoid
bypass access controls, and a small fix.

In addition, I tried to attach several short explanations about key points
to understand the big patch. I wrote the documentation of SE-PostgreSQL,
but it is not a description for *the patch*.
If reviewers have any unclear things, please feel free to ask me.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New FSM patch

2008-09-12 Thread Heikki Linnakangas


Heikki Linnakangas wrote:
I've also been working on a low level benchmark using a C user-defined 
function that exercises just the FSM, showing the very raw CPU 
performance vs. current implementation. More on that later, but ATM it 
looks like the new implementation can be faster or slower than the 
current one, depending on the table size.


Let me describe this test case first:
- The test program calls RecordAndGetPageWithFreeSpace in a tight loop, 
with random values. There's no activity to the heap. In normal usage, 
the time spent in RecordAndGetWithFreeSpace is minuscule compared to the 
heap and index updates that cause RecordAndGetWithFreeSpace to be called.
- WAL was placed on a RAM drive. This is of course not how people set up 
their database servers, but the point of this test was to measure CPU 
speed and scalability. The impact of writing extra WAL is significant 
and needs to be taken into account, but that's a separate test and 
discussion, and needs to be considered in comparison to the WAL written 
by heap and index updates.


That said, the test results are pretty interesting.

I ran the test using a custom scripts with pgbench. I ran it with 
different table sizes, and with 1 or 2 clients, on CVS HEAD and a 
patched version. The unit is "thousands of RecordAndGetPageWithFreeSpace 
calls per second":


Table size  Patched CVS HEAD
1 clnt  2 clnts 1 clnt  2 clients
8 kB4.593.4562.83   26.85
336 kB  13.85   6.4341.816.55
3336 kB 14.96   6.3 22.45   10.55
6 kB14.85   6.565.444.08
36 kB   14.48   11.04   0.790.74
336 kB  12.68   11.50.070.07
3336 kB 7.675.370.050.05

The big surprise to me was that performance on CVS HEAD tanks as the 
table size increases. One possible explanation is that searches for X 
bytes of free space, for a very high X, will not find any matches, and 
the current FSM implementation ends up scanning through the whole FSM 
list for that relation.


Another surprise was how badly both implementations scale. On CVS HEAD, 
I expected the performance to be roughly the same with 1 and 2 clients, 
because all access to the FSM is serialized on the FreeSpaceLock. But 
adding the 2nd client not only didn't help, but it actually made the 
performance much worse than with a single client. Context switching or 
cache line contention, perhaps? The new FSM implementation shows the 
same effect, which was an even bigger surprise. At table sizes > 32 MB, 
the FSM no longer fits on a single FSM page, so I expected almost a 
linear speed up with bigger table sizes from using multiple clients. 
That's not happening, and I don't know why. Although, going from 33MB to 
333 MB, the performance with 2 clients almost doubles, but it still 
doesn't exceed that with 1 client.


Going from 3 GB to 33 GB, the performance of the new implementation 
drops. I don't know why, I think I'll run some more tests with big table 
sizes to investigate that a bit more. The performance in the old 
implementation stays almost the same at that point; I believe that's 
because max_fsm_pages is exceeded at that point.


All in all, this isn't a very realistic test case, but it's interesting 
nevertheless. I'm happy with the performance of the new FSM on this 
test, as it's in the same ballpark as the old one, even though it's not 
quite what I expected.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Hannu Krosing

On Fri, 2008-09-12 at 09:45 +0100, Simon Riggs wrote:
> On Thu, 2008-09-11 at 15:42 +0300, Heikki Linnakangas wrote:
> > Gregory Stark wrote:
> > > b) vacuum on the server which cleans up a tuple the slave has in scope 
> > > has to
> > >block WAL reply on the slave (which I suppose defeats the purpose of 
> > > having
> > >a live standby for users concerned more with fail-over latency).
> > 
> > One problem with this, BTW, is that if there's a continuous stream of 
> > medium-length transaction in the slave, each new snapshot taken will 
> > prevent progress in the WAL replay, so the WAL replay will advance in 
> > "baby steps", and can fall behind indefinitely. As soon as there's a 
> > moment that there's no active snapshot, it can catch up, but if the 
> > slave is seriously busy, that might never happen.
> 
> It should be possible to do mixed mode.
> 
> Stall WAL apply for up to X seconds, then cancel queries. Some people
> may want X=0 or low, others might find X = very high acceptable (Merlin
> et al).

Or even milder version.

* Stall WAL apply for up to X seconds, 
* then stall new queries, let old ones run to completion (with optional
fallback to canceling after Y sec), 
* apply WAL. 
* Repeat.

-
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Hannu Krosing

On Fri, 2008-09-12 at 11:21 +0200, Csaba Nagy wrote:
> On Fri, 2008-09-12 at 09:38 +0100, Simon Riggs wrote:
> > If you request a block, we check to see whether there is a lookaside
> > copy of it prior to the tuple removals. We then redirect the block
> > request to a viewpoint relation's block. Each viewpoint gets a separate
> > relfilenode. We do the switcheroo while holding cleanup lock on block.
> 
> Wouldn't it make sense to also have a hint bit on the pages which are
> copied away ? 

There is no need for hint bit (and one bit would not be enough anyway,
as we may need to keep multiple versions)

A LSN of last row-removal (VACUUM or HOT-pruning) would serve as a hint
to start digging around in hash tables.

It seems though , that you may have to look into several disk snapshots
to find the page you need.

> Then instead of looking up a hash table, you first would
> look up that bit, and if not set you won't look up the hash table at
> all.  Then when you clean up the "lookaside copies" you clear those bits
> too...
> 
> That would probably perform somewhat better for reading than always
> looking up a potentially big hash table, and the cost of setting the
> hint is probably a lot less than copying away the page in the first
> place. Resetting the hint bit might be a bit more expensive.
> 
> Cheers,
> Csaba.
> 
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies

2008-09-12 Thread Magnus Hagander

Brendan Jurd wrote:
> Hi Magnus,
> 
> Josh has assigned your patch to me for an initial review.
> 
> First up I'd like to say that this is a really nice upgrade.
> Shielding a running server from reloading a bogus conf file makes a
> whole lot of sense.

Hi!

Thanks for your review.

> The patch doesn't include any updates to documentation.  I humbly
> suggest that the change to the ident map (making "sameuser" the
> default) should be mentioned both in the Manual itself, and in the
> sample conf file comments.  Here are a few sites that may be in want
> of an update:

Yes, absolutely. Thanks for the couple of pointers, that'll help me not
to miss them :-)

> The new error messages are good, but I wonder if they could be
> improved by using DETAIL segments.  The guidelines on error message
> style say that the primary message should be terse and make a
> reasonable effort to fit on one line.  For example, this:
> 
> LOG:  invalid IP address "a.b.c.d" in file
> "/home/direvus/src/postgres/inst/data/pg_hba.conf" line 77: Name or
> service not known
> 
> Could be rewritten as something like this:
> 
> LOG:  invalid IP address "a.b.c.d" in auth config: Name or service not 
> known
> DETAIL: In file "/home/direvus/src/postgres/inst/data/pg_hba.conf", line 
> 77

Makes a lot of sense, I'll go about making that change.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal of SE-PostgreSQL patches (for CommitFest:Sep)

2008-09-12 Thread KaiGai Kohei


Hello,

The latest SE-PostgreSQL patches are updated here:

[1/4] 
http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1005.patch
[2/4] 
http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1005.patch
[3/4] 
http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1005.patch
[4/4] 
http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1005.patch

They contains rebasing to the CVS HEAD, because we cannot apply the previous 
ones
correctly due to progress in the base version.
Rest of changes are here:
 - A new PGACE hook: pgaceIsAllowExecutorRunHook().
   It enables to control overriding ExecutorRun_hook, because several important
   hooks are invoked from standard_ExecutorRun().
 - T_SEvalItem related equal() functions are added to nodes/equalfuncs.c.
   # I've left for implement them.
 - Fix typo in src/include/security/pgace.h


BTW, I thought I have to show the overview of the patch to help reviwers.
The main patch ([1/4]) is especially big and contains new concepts.

The following explanation shows several key concept of SE-PostgreSQL.
I'm happy if it can reduce the load of reviwers.

No need to say, please feel free to ask me anything. :-)

Thanks,


Security hooks
--
We called it as PostgreSQL Access Control Extention (PGACE).
The "src/include/security/pgace.h" newly added declares these hooks as
inline functions. If HAVE_SELINUX is available at build time, they have
a valid implementation to invoke functions to make access control decision.
When the SE-PostgreSQL feature is disabled at build time or run time,
it does not change any existing behavior.

These hooks have a prefix of "pgace", like pgaceHeapTupleInsert().
This hook is invoked just before inserting a new tuple into a relation,
and the SE-PostgreSQL subsystem can make its decision.

Its argument provides information to make a decision. The pgaceHeapTupleInsert()
has four arguments like the target Relation object and newly inserted HeapTuple.

Specifications of each hooks are described in the 
"src/include/security/pgace.h".


Security attribute management
-
We need a security attribute of tuple to use it as a basic of access control
decision. SELinux calls it as "security context", and most of security aware
operating system has similar idea called as label.
It is represented as a text format like "system_u:object_r:etc_t:s0", and has
its characteristic that many objects tend to share a single security context.

We stores text represented security attribute into "pg_security" system catalog
and put an alternate key (oid of pg_security) on each tuples, because it is
unacceptable approach to put a raw string data on individual tuples.

The alternate key is stored in the tail of HeapTupleHeader, as "oid" doing.
This field is valid when t_infomask has HEAP_HASSECURITY bit.

   HeapTupleHeader
  +-+
  |   : |
  +-+
  |  t_infomask |
  +-+ pg_security system catalog
  |t_hoffo---+  
+---+-+
  +-+|  |  oid  |   seclabel
  |
  |   : ||  
+---+-+
  |   : ||  | 16389 | system_u:object_r:sepgsql_table_t:s0  
  |
  +-+|  | 16401 | system_u:object_r:sepgsql_proc_t:s0   
  |
  |*Oid security_id*|-> | 16402 | 
system_u:object_r:sepgsql_secret_table_t:s0 |
  +-+|  |   :   |   :   
  |
  | Oid object_id   ||
  +-+ <--+
  |   Data field|
  |   : |

The alternate key is just a internal representation, so we have to translate
it to/from text format when communicating to in-kernel SELinux, or export/import
them.

Note that the security attribute is also assigned to tuples within system
catalogs. A security attribute of a tuple within pg_attribute means column's
security attribute, and used to column-level access controls, for example.

The src/backend/security/pgaceCommon.c have functions to traslate them:

  char *pgaceLookupSecurityLabel(Oid security_id);
  Oid   pgaceLookupSecurityId(char *security_label);

When a new security_label is given and not found on pg_security,
pgaceLookupSecurityId() tries to insert a new tuple into pg_security and
returns its object id as an alternate key.

Two more similar functions are also declared:
  char *pgaceSidToSecurityLabel(Oid security_id)
  Oid   pgaceSecurityLabelToSid(char *security_label)
It also enables to translate between a cosmetic text format and an internal
security identifier.
An example of cosmetic format is:
  unconfined_u:unconfined_r:unconfined_t:SystemLow-SystemHigh
 
We have a case when `pg_security` system catalog is not availa

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy

On Fri, 2008-09-12 at 09:38 +0100, Simon Riggs wrote:
> If you request a block, we check to see whether there is a lookaside
> copy of it prior to the tuple removals. We then redirect the block
> request to a viewpoint relation's block. Each viewpoint gets a separate
> relfilenode. We do the switcheroo while holding cleanup lock on block.

Wouldn't it make sense to also have a hint bit on the pages which are
copied away ? Then instead of looking up a hash table, you first would
look up that bit, and if not set you won't look up the hash table at
all.  Then when you clean up the "lookaside copies" you clear those bits
too...

That would probably perform somewhat better for reading than always
looking up a potentially big hash table, and the cost of setting the
hint is probably a lot less than copying away the page in the first
place. Resetting the hint bit might be a bit more expensive.

Cheers,
Csaba.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_regress inputdir

2008-09-12 Thread Peter Eisentraut


Alvaro Herrera wrote:

Jorgen Austvik - Sun Norway wrote:

The attached patch makes pg_regress write converted files to  
/sql and /expected, which is one way to make it read  
and write to the same directory. Tested on Solaris x86 with pgsql "make  
check" and standalone.


Okay, so this patch does change it in a way that it still works, but
what else do you need to be able to run the test from another directory?
I tried to run the test from another directory with this patch
installed, and found that it didn't work because it's replacing
@abs_builddir@ in the input files improperly (to the current path; it
should be using the output dir path, I think)

So maybe this is a step in the right direction, but ISTM you need a
slightly larger patch for it to be actually useful.

If I am not making sense, then maybe I am not understanding what you
mean by running it standalone.  In that case, please explain.



I think the appropriate interface would be adding another option to 
pg_regress called --workdir or --tempdir, which defaults to PWD, and 
write the converted sql files there, and then look for the sql files to 
execute in workdir/sql and in inputdir/sql.  In some way, this copies 
the vpath search mechanism.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Simon Riggs


On Thu, 2008-09-11 at 15:42 +0300, Heikki Linnakangas wrote:
> Gregory Stark wrote:
> > b) vacuum on the server which cleans up a tuple the slave has in scope has 
> > to
> >block WAL reply on the slave (which I suppose defeats the purpose of 
> > having
> >a live standby for users concerned more with fail-over latency).
> 
> One problem with this, BTW, is that if there's a continuous stream of 
> medium-length transaction in the slave, each new snapshot taken will 
> prevent progress in the WAL replay, so the WAL replay will advance in 
> "baby steps", and can fall behind indefinitely. As soon as there's a 
> moment that there's no active snapshot, it can catch up, but if the 
> slave is seriously busy, that might never happen.

It should be possible to do mixed mode.

Stall WAL apply for up to X seconds, then cancel queries. Some people
may want X=0 or low, others might find X = very high acceptable (Merlin
et al).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What is d2mdir?

2008-09-12 Thread Peter Eisentraut


Tom Lane wrote:

Abhijit Menon-Sen <[EMAIL PROTECTED]> writes:

At 2008-09-02 15:10:23 +0300, [EMAIL PROTECTED] wrote:

[EMAIL PROTECTED] sgml]$ make man



As Alvaro noted recently, you need to use "make man D2MDIR=/some/path".


I see it's been like that for quite some time, but still it seems pretty
bogus.  Why isn't configure handling this?  If there's some good reason
not to automate it, why isn't it documented in the "Building The
Documentation" appendix?


I have updated the documentation accordingly.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Simon Riggs

On Thu, 2008-09-11 at 11:38 +0300, Hannu Krosing wrote:
> On Thu, 2008-09-11 at 09:24 +0300, Heikki Linnakangas wrote:
> 
> > I like the idea of acquiring snapshots locally in the slave much more. 
> > As you mentioned, the options there are to defer applying WAL, or cancel 
> > queries. 
> 
> More exotic ways to defer applying WAL include using some smart
> filesystems to get per-backend data snapshots, using either
> copy-of-write overlay filesystems and filesystem or disk level
> snapshots.

That's certainly possible.

That would mean we maintain a single consistent viewpoint of the whole
database for a period of time. The frozen viewpoint could move forwards
by operator command, or we might keep multiple frozen views.

We can have a LookasideHash table in memory that keeps track of which
blocks have had rows removed from them since the "frozen view" was
taken.

If you request a block, we check to see whether there is a lookaside
copy of it prior to the tuple removals. We then redirect the block
request to a viewpoint relation's block. Each viewpoint gets a separate
relfilenode. We do the switcheroo while holding cleanup lock on block.

So effectively we would be adding "frozen snapshot" technology to
Postgres. Although we would want to do, copy-on-clean rather than
copy-on-write. Which could mean significantly better performance.

That might be encumbered by patent in some way.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Simon Riggs


On Thu, 2008-09-11 at 17:58 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > So part of the handshake between
> > primary and standby must be "what is your recentxmin?". The primary will
> > then use the lower/earliest of the two.
> 
> Even then, the master might already have vacuumed away tuples that are 
> visible to an already running transaction in the slave, before the slave 
> connects. Presumably the master doesn't wait for the slave to connect 
> before starting to accept new connections.

Yep, OK.

> >> As you mentioned, the options there are to defer applying WAL, or cancel 
> >> queries. I think both options need the same ability to detect when 
> >> you're about to remove a tuple that's still visible to some snapshot, 
> >> just the action is different. We should probably provide a GUC to 
> >> control which you want.
> > 
> > I don't see any practical way of telling whether a tuple removal will
> > affect a snapshot or not. Each removed row would need to be checked
> > against each standby snapshot. Even if those were available, it would be
> > too costly. 
> 
> How about using the same method as we use in HeapTupleSatisfiesVacuum? 
> Before replaying a vacuum record, look at the xmax of the tuple 
> (assuming it committed). If it's < slave's OldestXmin, it can be 
> removed. Otherwise not. Like HeapTupleSatisfiesVacuum, it's 
> conservative, but doesn't require any extra bookkeeping.
> 
> And vice versa: if we implement the more precise book-keeping, with all 
> snapshots in shared memory or something, we might as well use it in 
> HeapTupleSatisfiesVacuum. That has been discussed before, but it's a 
> separate project.

Tuple removals earlier than the slave's OldestXmin are easy, thats true.
I'm not sure what you had in mind for "Otherwise not"? 

Maybe you mean "stop applying WAL until slave's OldestXmin is > tuple
removal xid". Not sure, reading other subthreads of this post.

I think its possible to defer removal actions on specific blocks only,
but that is an optimisation that's best left for a while.

BTW, tuple removals would need a cleanup lock on a block, just as they
do on master server. So WAL apply can be delayed momentarily by
pinholders anyway, whatever we do.

> > It was also suggested we might take the removed rows and put them in a
> > side table, but that makes me think of the earlier ideas for HOT and so
> > I've steered clear of that.
> 
> Yeah, that's non-trivial. Basically a whole new, different 
> implementation of MVCC, but without changing any on-disk formats.
> 
> BTW, we haven't talked about how to acquire a snapshot in the slave. 
> You'll somehow need to know which transactions have not yet committed, 
> but will in the future. In the master, we keep track of in-progress 
> transaction in the ProcArray, so I suppose we'll need to do the same in 
> the slave. Very similar to prepared transactions, actually. I believe 
> the Abort records, which are not actually needed for normal operation, 
> become critical here. The slave will need to put an entry to ProcArray 
> for any new XLogRecord.xl_xid it sees in the WAL, and remove the entry 
> at a Commit and Abort record. And clear them all at a shutdown record.

I wouldn't do it like that.

I was going to maintain a "current snapshot" in shared memory, away from
the PROCARRAY. Each time we see a TransactionId we check whether its
already been seen, if not, insert it. When a transaction commits or
aborts we remove the stated xid. If we see a shutdown checkpoint we
clear the array completely. When query backends want a snapshot they
just read the array.  It doesn't matter whether queries commit or abort,
since those changes can't be seen anyway by queries until commit.

Reason for doing it this way is PROCARRAY may be full of query backends,
so having dummy backends in there as well sounds confusing.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] pgbench duration option

2008-09-12 Thread Magnus Hagander

Tom Lane wrote:
> I wrote:
>> Are people satisfied that the Windows part of the patch is okay?
> 
> I went ahead and applied this patch after some trivial stylistic fixes.
> The buildfarm will soon tell us if the WIN32 part of the patch compiles,
> but not whether it works --- would someone double-check the functioning
> of the -T switch on Windows?

Confirmed, it works fine for me. Quick look at the win32 specific code
committed, I see nothing else that would be off in that either (two
handles that are never closed, but there's no point in caring in
pgbench, since it will never happen in a loop)

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better auth errors from libpq

2008-09-12 Thread Markus Wanner


Hi,

David Fetter wrote:

I'm all for something, and that's a much better something.  What we
have now--nothing--actively distresses newbies for no good reason.

I don't know how many people we've lost right at that point, but the
number has to be high, as most people don't just hop into IRC with
their problem.


Maybe something much more specific, i.e. triggering only if one tried to 
connect via localhost or unix sockets, and only if one tried to 
authenticate as 'root' without a password.


The hint shoud IMO say something like: "The default superuser is 
postgres, not root". Something that's useful for this specific case and 
doesn't disturb in others. And something that's public knowledge, which 
any reasonably serious attacker already knows anyway.


Maybe also point out that the unix user is chosen by default. Assuming 
that most of these users didn't explicitly type 'root' and are wondering 
where that 'root' user came from.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better auth errors from libpq

2008-09-12 Thread Gregory Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Joshua Drake <[EMAIL PROTECTED]> writes:
>> I think something like:
>
>> psql: FATAL:  Ident authentication failed for user "root"
>> HINT: http://www.postgresql.org/docs/8.3/static/client-authentication.html
>
>> Would be nice.
...
>
> Or to put it even more baldly: this is not an area in which you can
> improve matters significantly with five minutes' thought and a one-line
> patch.  It would take some actual work.

Actually I think there is a problem with the original message that could be
improved. The problem is that "Ident" is a Postgres-specific term that a
newbie DBA is unlikely to understand. What's worse it's an ambiguous term that
is easily misunderstood to refer to the rfc1413 ident protocol which Postgres
might or might not be using.

I would suggest instead describing it using more generic terminology though
offhand I'm not sure what that would be. A detail line could include the
Postgres-specific authentication method which failed.

I do think it's true that the pg_hba setup is far more complex than it has to
be and that that's a bigger problem than a simple error message too.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move src/tools/backend/ to wiki

2008-09-12 Thread Peter Eisentraut


Alvaro Herrera wrote:

However I wonder how much value there really is in the developer's FAQ,
considering that some answers seem rather poor.  For example the
answer on ereport() was wrong, and nobody ever pointed it out.  The
answer on palloc/pfree is very incomplete too.


I think the developer's FAQ has essentially been unmaintained for many 
years.  I think we should gradually migrate the content to other wiki 
pages and eventually drop the FAQ.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

85 matches

Mail list logo