Re: [HACKERS] mapping object names to role IDs

2010-05-23 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 Long story short, this is kind of a mess.

I agree that it's a bit of a mess.

 What I would propose is that we create a new source
 file somewhere (maybe utils/cache), move all of the other functions of
 this type there, give them standardized names, and provide them all
 with an argument that specifies whether an error is to be thrown if
 the object doesn't exist.

Something which has come up in the past is that putting all the
functions that do the same kind of thing, but operate on different
types of objects, into the same backend file/area ends up meaning that
such an area has an untold amount of knowledge about everything.
Additionally, what *does* go into said area has tons of things that are
only related by the kind of operation- not because they actually have
anything to do with each other.

This was one of the complaints levied at the approach for moving all the
ACL checking into one place.  I think it would be good to have a
consistant naming/calling scheme for these various functions, but I'm
not sure that moving them all to the same place makes sense.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] mapping object names to role IDs

2010-05-23 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Robert Haas (robertmh...@gmail.com) wrote:
 Long story short, this is kind of a mess.

 ... I think it would be good to have a
 consistant naming/calling scheme for these various functions, but I'm
 not sure that moving them all to the same place makes sense.

I'm with Stephen on this one.  I agree that standardizing the function
names and behavior would be a good idea, but don't try to put them all
in one place.

BTW, the plain-name cases should be const char *, else some callers
will have to cast away const.  You could possibly make an argument for
const List * in the qualified-name cases, but we don't do that
anywhere else so I think it'd just look funny here (and would require
internally casting away const, too).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mapping object names to role IDs

2010-05-23 Thread Robert Haas
On Sun, May 23, 2010 at 11:10 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Stephen Frost sfr...@snowman.net writes:
 * Robert Haas (robertmh...@gmail.com) wrote:
 Long story short, this is kind of a mess.

 ... I think it would be good to have a
 consistant naming/calling scheme for these various functions, but I'm
 not sure that moving them all to the same place makes sense.

 I'm with Stephen on this one.  I agree that standardizing the function
 names and behavior would be a good idea, but don't try to put them all
 in one place.

Some of the existing functions are in lsyscache.c, some are in files
in the commands directory, and some are in files in the parser
directory; also, even between commands and parser, not every object
type has its own file.  It would be nice to bring some consistency to
where the functions are located as well as what they do.  Any thoughts
on how to achieve that?

 BTW, the plain-name cases should be const char *, else some callers
 will have to cast away const.  You could possibly make an argument for
 const List * in the qualified-name cases, but we don't do that
 anywhere else so I think it'd just look funny here (and would require
 internally casting away const, too).

Makes sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move 'long long' check to c.h

2010-05-23 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
   While reviewing bfba40e2c7b3909d3de13bd1b83b7e85fa8dfec2 (mmm, we like
   git diff -p), I noted that c.h is already included by both extern.h
   and ecpg.header through postgres_fe.h.  Given this and that we're
   already doing alot of similar #define's there (unlike in those other
   files), I felt c.h was a more appropriate place.  Putting it in c.h
   also means we don't have to duplicate that code.

Ugh.  Moving that to c.h doesn't render it not junk code.  (For one
thing, it will not operate as intended if you haven't previously
#included limits.h, which in fact is not included in c.h.)

If we need this we should do it properly with autoconf.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mapping object names to role IDs

2010-05-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, May 23, 2010 at 11:10 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm with Stephen on this one.  I agree that standardizing the function
 names and behavior would be a good idea, but don't try to put them all
 in one place.

 Some of the existing functions are in lsyscache.c, some are in files
 in the commands directory, and some are in files in the parser
 directory; also, even between commands and parser, not every object
 type has its own file.  It would be nice to bring some consistency to
 where the functions are located as well as what they do.  Any thoughts
 on how to achieve that?

I think both Stephen and I are saying we don't see merit in that.
Moving around pre-existing functions won't accomplish much except
causing include-list churn.  Let's just standardize the names/APIs
and be done.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move 'long long' check to c.h

2010-05-23 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Ugh.  Moving that to c.h doesn't render it not junk code.  (For one
 thing, it will not operate as intended if you haven't previously
 #included limits.h, which in fact is not included in c.h.)

Doh.

 If we need this we should do it properly with autoconf.

My autoconf foo is not very good, but once I finish a couple of other
things I'll take a shot at doing it that way.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] beta testing - planner bug - ERROR: XX000: failed to build any 2-way joins

2010-05-23 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 postgres=# explain SELECT name
 FROM person
 WHERE name IN (
SELECT name FROM person p
LEFT JOIN person_data ON p.id = person_data.id);
 ERROR:  failed to build any 2-way joins

Fixed, thanks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta testing - pg_upgrade bug fix - double free

2010-05-23 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 --- a/contrib/pg_upgrade/check.c
 +++ b/contrib/pg_upgrade/check.c
 @@ -154,7 +154,6 @@ issue_warnings(migratorContext *ctx, char
 *sequence_script_file_name)
  ctx-new.bindir,
 ctx-new.port, sequence_script_file_name,
  ctx-logfile);
unlink(sequence_script_file_name);
 -   pg_free(sequence_script_file_name);
check_ok(ctx);
}

Done, thanks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mapping object names to role IDs

2010-05-23 Thread Robert Haas
On Sun, May 23, 2010 at 11:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, May 23, 2010 at 11:10 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm with Stephen on this one.  I agree that standardizing the function
 names and behavior would be a good idea, but don't try to put them all
 in one place.

 Some of the existing functions are in lsyscache.c, some are in files
 in the commands directory, and some are in files in the parser
 directory; also, even between commands and parser, not every object
 type has its own file.  It would be nice to bring some consistency to
 where the functions are located as well as what they do.  Any thoughts
 on how to achieve that?

 I think both Stephen and I are saying we don't see merit in that.
 Moving around pre-existing functions won't accomplish much except
 causing include-list churn.  Let's just standardize the names/APIs
 and be done.

Where do we put the new functions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mapping object names to role IDs

2010-05-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, May 23, 2010 at 11:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think both Stephen and I are saying we don't see merit in that.
 Moving around pre-existing functions won't accomplish much except
 causing include-list churn.  Let's just standardize the names/APIs
 and be done.

 Where do we put the new functions?

Probably in files that are already concerned with each object type.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mapping object names to role IDs

2010-05-23 Thread Robert Haas
On Sun, May 23, 2010 at 1:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, May 23, 2010 at 11:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think both Stephen and I are saying we don't see merit in that.
 Moving around pre-existing functions won't accomplish much except
 causing include-list churn.  Let's just standardize the names/APIs
 and be done.

 Where do we put the new functions?

 Probably in files that are already concerned with each object type.

Not every object type has a file, and the existing functions are split
across three different directories, sometimes in files that don't
really pertain to the object type being dealt with.  I think this is
going to be difficult to maintain if we intentionally spread out the
parallel code across essentially the entire backend.  But I guess I
can code it up and we can argue about it then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mapping object names to role IDs

2010-05-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Not every object type has a file, and the existing functions are split
 across three different directories, sometimes in files that don't
 really pertain to the object type being dealt with.  I think this is
 going to be difficult to maintain if we intentionally spread out the
 parallel code across essentially the entire backend.  But I guess I
 can code it up and we can argue about it then.

The only thing that seems really significantly parallel is the error
message to be issued for object-not-found.  I would suggest maybe
putting the code in lsyscache.c, except that lsyscache functions
generally are not expected to throw error on object-not-found.

As for not every object type has a file, there is certainly code
someplace that would be calling these functions.  Whereever (the
preponderance of) such calls are would be an appropriate place for the
function, IMO.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Exposing the Xact commit order to the user

2010-05-23 Thread Jan Wieck

In some systems (data warehousing, replication), the order of commits is
important, since that is the order in which changes have become visible.
This information could theoretically be extracted from the WAL, but
scanning the entire WAL just to extract this tidbit of information would
be excruciatingly painful.

The following is based on ideas that emerged during last weeks PGCon.
Consider it an implementation proposal, if you like.

We introduce a new set of files. The files represent segments of an
infinite array of structures. The present segments are the available
window of data. Similar to CLOG files, the individual file name will
represent the high bits of a serial number, the offset of the record
inside the file represents the low bits of the serial.

The system will have postgresql.conf options for enabling/disabling the
whole shebang, how many shared buffers to allocate for managing access
to the data and to define the retention period of the data based on data
volume and/or age of the commit records.

Each record of the Transaction Commit Info consists of

 txid  xci_transaction_id
 timestamptz   xci_begin_timestamp
 timestamptz   xci_commit_timestamp
 int64 xci_total_rowcount

32 bytes total.

CommitTransaction() inside of xact.c will call a function, that inserts
a new record into this array. The operation will for most of the time be
nothing than taking a spinlock and adding the record to shared memory.
All the data for the record is readily available, does not require
further locking and can be collected locally before taking the spinlock.
The begin_timestamp is the transactions idea of CURRENT_TIMESTAMP, the
commit_timestamp is what CommitTransaction() just decided to write into
the WAL commit record and the total_rowcount is the sum of inserted,
updated and deleted heap tuples during the transaction, which should be
easily available from the statistics collector, unless row stats are
disabled, in which case the datum would be zero.

The function will return the sequence number which CommitTransaction()
in turn will record in the WAL commit record together with the
begin_timestamp. While both, the begin as well as the commit timestamp
are crucial to determine what data a particular transaction should have
seen, the row count is not and will not be recorded in WAL.

Checkpoint handling will call a function to flush the shared buffers.
Together with this, the information from WAL records will be sufficient
to recover this data (except for row counts) during crash recovery.

Exposing the data will be done via a set returning function. The SRF
takes two arguments. The maximum number of rows to return and the last
serial number processed by the reader. The advantage of such SRF is that
the result can be used in a query that right away delivers audit or
replication log information in transaction commit order. The SRF can
return an empty set if no further transactions have committed since, or
an error if data segments needed to answer the request have already been
purged.

Purging of the data will be possible in several different ways.
Autovacuum will call a function that drops segments of the data that are
  outside the postgresql.conf configuration with respect to maximum age
or data volume. There will also be a function reserved for superusers to
explicitly purge the data up to a certain serial number.


Comments, suggestions?


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-23 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 Exposing the data will be done via a set returning function. The SRF
 takes two arguments. The maximum number of rows to return and the last
 serial number processed by the reader. The advantage of such SRF is that
 the result can be used in a query that right away delivers audit or
 replication log information in transaction commit order. The SRF can
 return an empty set if no further transactions have committed since, or
 an error if data segments needed to answer the request have already been
 purged.

In light of the proposed purging scheme, how would it be able to distinguish 
between those two cases (nothing there yet vs. was there but purged)?

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201005231646
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkv5lIAACgkQvJuQZxSWSsiR3gCgvyK/NPd6WmKGUqdo/3fdWIR7
LAQAoJqk3gYpEgtjw10gINDKFXTAnWO5
=sSvK
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-23 Thread Ron Mayer
Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 So... can we get back to coming up with a reasonable
 definition,
 
 (1) no access to system calls (including file and network I/O)

If a PL has file access to it's own sandbox (similar to what
flash seems to do in web browsers), could that be considered
trusted?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-23 Thread Robert Haas
On Sun, May 23, 2010 at 4:21 PM, Jan Wieck janwi...@yahoo.com wrote:
 The system will have postgresql.conf options for enabling/disabling the
 whole shebang, how many shared buffers to allocate for managing access
 to the data and to define the retention period of the data based on data
 volume and/or age of the commit records.

It would be nice if this could just be managed out of shared_buffers
rather than needing to configure a separate pool just for this
feature.  But, I'm not sure how much work that is, and if it turns out
to be too ugly then I'd say it's not a hard requirement.  In general,
I think we talked during the meeting about the desirability of folding
specific pools into shared_buffers rather than managing them
separately, but I'm not aware that we have any cases where we do that
today so it might be hard (or not).

 Each record of the Transaction Commit Info consists of

     txid          xci_transaction_id
     timestamptz   xci_begin_timestamp
     timestamptz   xci_commit_timestamp
     int64         xci_total_rowcount

 32 bytes total.

Are we sure it's worth including the row count?  I wonder if we ought
to leave that out and let individual clients of the mechanism track
that if they're so inclined, especially since it won't be reliable
anyway.

 CommitTransaction() inside of xact.c will call a function, that inserts
 a new record into this array. The operation will for most of the time be
 nothing than taking a spinlock and adding the record to shared memory.
 All the data for the record is readily available, does not require
 further locking and can be collected locally before taking the spinlock.

What happens when you need to switch pages?

 The function will return the sequence number which CommitTransaction()
 in turn will record in the WAL commit record together with the
 begin_timestamp. While both, the begin as well as the commit timestamp
 are crucial to determine what data a particular transaction should have
 seen, the row count is not and will not be recorded in WAL.

It would certainly be better if we didn't to bloat the commit xlog
records to do this.  Is there any way to avoid that?

 Checkpoint handling will call a function to flush the shared buffers.
 Together with this, the information from WAL records will be sufficient
 to recover this data (except for row counts) during crash recovery.

Right.

 Exposing the data will be done via a set returning function. The SRF
 takes two arguments. The maximum number of rows to return and the last
 serial number processed by the reader. The advantage of such SRF is that
 the result can be used in a query that right away delivers audit or
 replication log information in transaction commit order. The SRF can
 return an empty set if no further transactions have committed since, or
 an error if data segments needed to answer the request have already been
 purged.

 Purging of the data will be possible in several different ways.
 Autovacuum will call a function that drops segments of the data that are
  outside the postgresql.conf configuration with respect to maximum age
 or data volume. There will also be a function reserved for superusers to
 explicitly purge the data up to a certain serial number.

Dunno if autovacuuming this is the right way to go.  Seems like that
could leave to replication breaks, and it's also more work than not
doing that.  I'd just say that if you turn this on you're responsible
for pruning it, full stop.

 Anyone who trades liberty for security deserves neither
 liberty nor security. -- Benjamin Franklin

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-23 Thread Jan Wieck

On 5/23/2010 6:14 PM, Ron Mayer wrote:

Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

So... can we get back to coming up with a reasonable
definition,


(1) no access to system calls (including file and network I/O)


If a PL has file access to it's own sandbox (similar to what
flash seems to do in web browsers), could that be considered
trusted?


That is a good question.

Currently, the first of all TRUSTED languages, PL/Tcl, would allow the 
function of a lesser privileged user access the global objects of 
every other database user created within the same session.


These are per backend in memory objects, but none the less, an evil 
function could just scan the per backend Tcl namespace and look for 
compromising data, and that's not exactly what TRUSTED is all about.


In the case of Tcl it is possible to create a separate safe 
interpreter per DB role to fix this. I actually think this would be the 
right thing to do.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-23 Thread Jan Wieck

On 5/23/2010 4:48 PM, Greg Sabino Mullane wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160



Exposing the data will be done via a set returning function. The SRF
takes two arguments. The maximum number of rows to return and the last
serial number processed by the reader. The advantage of such SRF is that
the result can be used in a query that right away delivers audit or
replication log information in transaction commit order. The SRF can
return an empty set if no further transactions have committed since, or
an error if data segments needed to answer the request have already been
purged.


In light of the proposed purging scheme, how would it be able to distinguish 
between those two cases (nothing there yet vs. was there but purged)?


There is a difference between an empty result set and an exception.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-23 Thread Jan Wieck

On 5/23/2010 8:38 PM, Robert Haas wrote:

On Sun, May 23, 2010 at 4:21 PM, Jan Wieck janwi...@yahoo.com wrote:

The system will have postgresql.conf options for enabling/disabling the
whole shebang, how many shared buffers to allocate for managing access
to the data and to define the retention period of the data based on data
volume and/or age of the commit records.


It would be nice if this could just be managed out of shared_buffers
rather than needing to configure a separate pool just for this
feature.  But, I'm not sure how much work that is, and if it turns out
to be too ugly then I'd say it's not a hard requirement.  In general,
I think we talked during the meeting about the desirability of folding
specific pools into shared_buffers rather than managing them
separately, but I'm not aware that we have any cases where we do that
today so it might be hard (or not).


I'm not sure the retention policies of the shared buffer cache, the WAL 
buffers, CLOG buffers and every other thing we try to cache are that 
easy to fold into one single set of logic. But I'm all ears.





Each record of the Transaction Commit Info consists of

txid  xci_transaction_id
timestamptz   xci_begin_timestamp
timestamptz   xci_commit_timestamp
int64 xci_total_rowcount

32 bytes total.


Are we sure it's worth including the row count?  I wonder if we ought
to leave that out and let individual clients of the mechanism track
that if they're so inclined, especially since it won't be reliable
anyway.


Nope, we (my belly and I) are not sure about the absolute worth of the 
row count. It would be a convenient number to have there, but I can live 
without it.





CommitTransaction() inside of xact.c will call a function, that inserts
a new record into this array. The operation will for most of the time be
nothing than taking a spinlock and adding the record to shared memory.
All the data for the record is readily available, does not require
further locking and can be collected locally before taking the spinlock.


What happens when you need to switch pages?


Then the code will have to grab another free buffer or evict one.




The function will return the sequence number which CommitTransaction()
in turn will record in the WAL commit record together with the
begin_timestamp. While both, the begin as well as the commit timestamp
are crucial to determine what data a particular transaction should have
seen, the row count is not and will not be recorded in WAL.


It would certainly be better if we didn't to bloat the commit xlog
records to do this.  Is there any way to avoid that?


If you can tell me how a crash recovering system can figure out what the 
exact sequence number of the WAL commit record at hand should be, 
let's rip it.





Checkpoint handling will call a function to flush the shared buffers.
Together with this, the information from WAL records will be sufficient
to recover this data (except for row counts) during crash recovery.


Right.


Exposing the data will be done via a set returning function. The SRF
takes two arguments. The maximum number of rows to return and the last
serial number processed by the reader. The advantage of such SRF is that
the result can be used in a query that right away delivers audit or
replication log information in transaction commit order. The SRF can
return an empty set if no further transactions have committed since, or
an error if data segments needed to answer the request have already been
purged.

Purging of the data will be possible in several different ways.
Autovacuum will call a function that drops segments of the data that are
 outside the postgresql.conf configuration with respect to maximum age
or data volume. There will also be a function reserved for superusers to
explicitly purge the data up to a certain serial number.


Dunno if autovacuuming this is the right way to go.  Seems like that
could leave to replication breaks, and it's also more work than not
doing that.  I'd just say that if you turn this on you're responsible
for pruning it, full stop.


It is an option. Keep it until I tell you is a perfectly valid 
configuration option. One you probably don't want to forget about, but 
valid none the less.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-23 Thread Andrew Dunstan



Jan Wieck wrote:

On 5/23/2010 6:14 PM, Ron Mayer wrote:

Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

So... can we get back to coming up with a reasonable
definition,


(1) no access to system calls (including file and network I/O)


If a PL has file access to it's own sandbox (similar to what
flash seems to do in web browsers), could that be considered
trusted?


That is a good question.

Currently, the first of all TRUSTED languages, PL/Tcl, would allow the 
function of a lesser privileged user access the global objects of 
every other database user created within the same session.


These are per backend in memory objects, but none the less, an evil 
function could just scan the per backend Tcl namespace and look for 
compromising data, and that's not exactly what TRUSTED is all about.


In the case of Tcl it is possible to create a separate safe 
interpreter per DB role to fix this. I actually think this would be 
the right thing to do.




I think that would probably be serious overkill. Maybe a data stash per 
role rather than an interpreter per role would be doable. it would 
certainly be more lightweight.


ISTM we are in danger of confusing several different things. A user that 
doesn't want data to be shared should not stash it in global objects. 
But to me, trusting a language is not about making data private, but 
about not allowing the user to do things that are dangerous, such as 
referencing memory, or the file system, or the operating system, or 
network connections, or loading code which might do any of those things.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-23 Thread Craig Ringer

On 21/05/10 23:55, Josh Berkus wrote:

So, here's a working definition:

1) cannot directly read or write files on the server.


It must also prevent PL-user-level access to file descriptors already 
open by the backend. That's implicitly covered in the above, but should 
probably be explicit.



2) cannot bind network ports
3) uses only the SPI interface to interact with postgresql tables etc.
4) does any logging only using elog to the postgres log


5) Cannot dynamically load shared libraries from user-supplied locations

(eg in Python, 'import' of a module that had a .so component would be 
blocked unless it was in the core module path)



a) it seems like there should be some kind of restriction on access to
memory, but I'm not clear on how that would be defined.


Like:

5) Has no way to directly access backend memory, ie doesn't have 
PL-user-accessible pointers or user access to any C-level calls that 
take/return them. Data structures containing pointers must be opaque to 
the PL user.


The idea being that if you have no access to C APIs that work with 
pointers to memory, and you can't use files (/dev/mem, /proc/self/mem, 
etc), you can't work with backend memory directly.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-23 Thread Jan Wieck

On 5/23/2010 10:04 PM, Andrew Dunstan wrote:


Jan Wieck wrote:

On 5/23/2010 6:14 PM, Ron Mayer wrote:

Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

So... can we get back to coming up with a reasonable
definition,


(1) no access to system calls (including file and network I/O)


If a PL has file access to it's own sandbox (similar to what
flash seems to do in web browsers), could that be considered
trusted?


That is a good question.

Currently, the first of all TRUSTED languages, PL/Tcl, would allow the 
function of a lesser privileged user access the global objects of 
every other database user created within the same session.


These are per backend in memory objects, but none the less, an evil 
function could just scan the per backend Tcl namespace and look for 
compromising data, and that's not exactly what TRUSTED is all about.


In the case of Tcl it is possible to create a separate safe 
interpreter per DB role to fix this. I actually think this would be 
the right thing to do.




I think that would probably be serious overkill. Maybe a data stash per 
role rather than an interpreter per role would be doable. it would 
certainly be more lightweight.


ISTM we are in danger of confusing several different things. A user that 
doesn't want data to be shared should not stash it in global objects. 
But to me, trusting a language is not about making data private, but 
about not allowing the user to do things that are dangerous, such as 
referencing memory, or the file system, or the operating system, or 
network connections, or loading code which might do any of those things.


How is loading code which might do any of those things different from 
writing a stored procedure, that accesses data, a careless superuser 
left in a global variable? Remember, the code of a PL function is open 
source - like in everyone can select from pg_proc. You really don't 
expect anyone to scan for your global variables just because they can 
write functions in the same language?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-23 Thread Craig Ringer

On 22/05/10 02:12, Robert Haas wrote:

On Fri, May 21, 2010 at 1:58 PM, David Fetterda...@fetter.org  wrote:

On Fri, May 21, 2010 at 01:45:45PM -0400, Stephen Frost wrote:

* David Fetter (da...@fetter.org) wrote:

That is *precisely* the business we need to be in, at least for the
languages we ship, and it would behoove us to test languages we don't
ship so we can warn people when they don't pass.


k, let's start with something simpler first tho- I'm sure we can pull in
the glibc regression tests and run them too.  You know, just in case
there's a bug there, somewhere.


That's pretty pure straw man argument.  I expect much higher quality
trolling.  D-.


I'm sorely tempted to try to provide some higher-quality trolling, but
in all seriousness I think that (1) we could certainly use much better
regression tests in many areas of which this is one and (2) it will
never be possible to catch all security bugs - in particular - via
regression testing because they typically stem from cases people
didn't consider.  So... can we get back to coming up with a reasonable
definition, and if somebody wants to write some regression tests, all
the better?


Personally, I don't think a PL should be trusted unless it _does_ define 
a whitelist of operations. Experience in the wider world has shown that 
this is the only approach that works. Regression testing to make sure 
all possible approaches to access unsafe features are blocked is doomed 
to have holes where there's another approach that hasn't been thought of 
yet.


Perl's new approach is whitelist based. Python restricted mode failed 
not least because it was a blacklist and people kept on finding ways 
around it. Lua and JavaScript are great examples of whitelist 
approaches, where the language just doesn't expose features that're 
dangerous - in fact, the core language doesn't even *have* those 
features. PL/PgSQL is the same, and works well as a trusted language for 
that reason.


Java's SecurityManager is whitelist based (allowed classes, allowed 
operations), and has proved very secure.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-23 Thread Andrew Dunstan



Jan Wieck wrote:


ISTM we are in danger of confusing several different things. A user 
that doesn't want data to be shared should not stash it in global 
objects. But to me, trusting a language is not about making data 
private, but about not allowing the user to do things that are 
dangerous, such as referencing memory, or the file system, or the 
operating system, or network connections, or loading code which might 
do any of those things.


How is loading code which might do any of those things different 
from writing a stored procedure, that accesses data, a careless 
superuser left in a global variable? Remember, the code of a PL 
function is open source - like in everyone can select from 
pg_proc. You really don't expect anyone to scan for your global 
variables just because they can write functions in the same language?




Well, that threat arises from the unsafe actions of the careless 
superuser. And we could at least ameliorate it by providing a per role 
data stash, at very little cost, as I mentioned. It's not like we don't 
know about such threats, and I'm certainly not pretending they don't 
exist. The 9.0 PL/Perl docs say:


   The %_SHARED variable and other global state within the language is
   public data, available to all PL/Perl functions within a session.
   Use with care, especially in situations that involve use of multiple
   roles or SECURITY DEFINER functions.


But the threats I was referring to arise if the language allows them to, 
without any requirement for unsafe actions by another user. Protecting 
against those is the essence of trustedness in my mind at least.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-23 Thread Fujii Masao
On Wed, May 19, 2010 at 2:47 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Oh, right. How about allowing the postmaster only in PM_STARTUP,
 PM_RECOVERY, PM_HOT_STANDBY or PM_WAIT_READONLY state to invoke
 walreceiver? We can keep walreceiver alive until all read only
 backends have gone, and prevent unexpected startup of walreceiver.

 Yes, that seems like something we should be checking, if we aren't already.

 I'll do that.

Here is the updated version. I added the above-mentioned check
into the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_smart_shutdown_in_recovery_v4_fujii.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers