date:20081211


On Thu, 2008-12-11 at 09:44 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  When the WAL starts streaming the *primary* can immediately perform
  synchronous replication, i.e. commit waits for transfer. 
 
 Until the standby has obtained all the missing log files, it's not 
 up-to-date, and there's no guarantee that it can finish the replay. For 
 example, imagine that your archive_command is an scp from the primary to 
 the standby. If a lightning strikes the primary before some WAL file has 
 been copied over to the archive directory in the standby, the standby 
 can't catch up. In the primary then, what's the point for a commit to 
 wait for transfer, if the reply from the standby doesn't guarantee that 
 the transaction is safe in the standby?

The WAL files will have already left the primary. 

Timeline is this in my understanding
1 [Primary] Set up continuous archiving 
2 [Primary] Take base backup
3 [Standby] Connect to primary to initiate streaming
4 [Primary] Log switch and, optionally, turn off archiving
5 [Standby] Begin replaying files, initially from archive
6 [Standby] Switch to replaying WAL records immediately after streaming

So sync rep would turn on after step 4, so that all intermediate WAL
files have been sent to the archive. If we lose the Primary after this
point then all transactions are accessible to standby. If we lose the
Standby or Archive, then we need to replace them and re-run the above.

The above was outlined on thread Synchronous Log Shipping Replication
and pretty much all agreed on 18 Sep.

Recent changes I have requested in the architecture are:
* making archiving optional on primary, so we don't need to send WAL
data *twice*. 
* allowing streaming/startup process to work together via shared memory,
to reduce average replication delay and improve performance
* skip archiving/de-archiving step on standby because it's superfluous
(all on this thread)

All of those are fairly minor code changes, but reduce complexity of
solution and significantly reduce the amount of copying of WAL files (3
copy actions to/from archive removed without loss of robustness). I
would have made the suggestions earlier but it wasn't until I saw the
architecture diagrams that I understood the intention of the code.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: default values for function parameters

2008/12/10 Pavel Stehule [EMAIL PROTECTED]:
 2008/12/10 Tom Lane [EMAIL PROTECTED]:
 Pavel Stehule [EMAIL PROTECTED] writes:
 next argument - if we accept AS for param names, then we introduce
 nonconsistent behave with SQL/XML functions.

 select xmlforest(c1, c2 as foo, c3) -- there foo isn't doesn't mean
 use it as param foo,

 It could be read as meaning that, I think.

 In any case, I'm not wedded to using AS for this, and am happy to
 consider other suggestions.

what do you thing about?

select fce(p1,p2,p3, SET paramname1 = val, paramname2 = val)

example
select dosome(10,20,30, SET flaga = true, flagb = false)

regards
Pavel Stehule


 me too

 regards
 Pavel Stehule

 But = isn't acceptable.

regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code


Simon Riggs wrote:

On Thu, 2008-12-11 at 09:44 +0200, Heikki Linnakangas wrote:

Simon Riggs wrote:

When the WAL starts streaming the *primary* can immediately perform
synchronous replication, i.e. commit waits for transfer. 
Until the standby has obtained all the missing log files, it's not 
up-to-date, and there's no guarantee that it can finish the replay. For 
example, imagine that your archive_command is an scp from the primary to 
the standby. If a lightning strikes the primary before some WAL file has 
been copied over to the archive directory in the standby, the standby 
can't catch up. In the primary then, what's the point for a commit to 
wait for transfer, if the reply from the standby doesn't guarantee that 
the transaction is safe in the standby?


The WAL files will have already left the primary. 


Timeline is this in my understanding
1 [Primary] Set up continuous archiving 
2 [Primary] Take base backup

3 [Standby] Connect to primary to initiate streaming
4 [Primary] Log switch and, optionally, turn off archiving
5 [Standby] Begin replaying files, initially from archive
6 [Standby] Switch to replaying WAL records immediately after streaming

So sync rep would turn on after step 4, so that all intermediate WAL
files have been sent to the archive.  If we lose the Primary after this
point then all transactions are accessible to standby. If we lose the
Standby or Archive, then we need to replace them and re-run the above.


Between steps 4 and 5, there's no guarantee that all WAL files generated 
after step 3 and the start of streaming have already been archived. 
There's a delay between writing a WAL file and when the file has been 
safely archived. If you lose the primary during that window, the standby 
will have old WAL files in the archive, the most recent ones in received 
by walreceiver, but it's missing the WAL files generated just before the 
switch to streaming mode.



Recent changes I have requested in the architecture are:
* making archiving optional on primary, so we don't need to send WAL
data *twice*. 


Agreed. I'm not so much worried about the bandwidth, but it's a lot of 
extra work from administration point of view. It's very hard to get it 
right, so that you eliminate windows like the above.


As the patch stands, if you turn off archiving in the primary, and the 
standby ever disconnects, even for only a few seconds, the standby will 
miss any WAL generated until it reconnects, and without archiving 
there's no way for the standby to get hold of the missed WAL.



* allowing streaming/startup process to work together via shared memory,
to reduce average replication delay and improve performance
* skip archiving/de-archiving step on standby because it's superfluous
(all on this thread)

All of those are fairly minor code changes, but reduce complexity of
solution and significantly reduce the amount of copying of WAL files (3
copy actions to/from archive removed without loss of robustness). I
would have made the suggestions earlier but it wasn't until I saw the
architecture diagrams that I understood the intention of the code.


To make archiving optional in the primary, I don't see any other choice 
than adding the capability for the standby to request arbitrary WAL 
files from the primary, over the wire. That seems like a pretty 
significant change to walsender: it needs to be able to read WAL not 
only from wal_buffers, but from files. That would be a good idea for 
performance reasons, too: currently if there's a network glitch and the 
primary doesn't get acknowledgements from the standby for a short while, 
XLogInserts in the primary will block waiting for the standby after 
wal_buffers fills up. That's not a big deal for synchronous replication, 
but in asynchronous mode you don't want network glitches like that to 
stall the primary.


And of course it means changes in the startup code as well. And we'll 
need bookkeeping in the primary of what WAL the standby has already 
received, so that it doesn't recycle the WAL segments until they've been 
sent to the standby. Or alternatively, the primary needs to be able to 
retrieve segments from the archive, but then we're dependent on 
archiving again.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] COCOMO Indians

Hi, Pgsql-hackers.

We would like to obtain your opinion on these two questions:


1) We wanna append possibilities into Postgres engine, and wanna get top 
estimation for
size of code, cost and time of implementation.
1.1) We divide possibilities to elementary features, find analogues in already 
written
code, and suppose e.g., that quantity of lines for 'create timer' will be 
similar to
'create function', and that implementation for 'create timer' is easy than 
implementation
of 'create function' (because it already has prototype in 'create function', 
and coping
source code is possiblle)
1.2) We calculate cost and time by COCOMO http://en.wikipedia.org/wiki/Cocomo

How relevant is this estimation ?


2) We are captivated by price of Indians,
we listened much about low quality of code, written by Indians,
we are fearing, that American company will resale implementation to Indian 
subcontractor
(i.e. real developers will be Indians anyway).

What requirements should satisfy code, written by Indians, to be in next 
version of Postgres ?



Dmitry (SQL50, HTML60)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] COCOMO Indians

Hi, Pgsql-hackers.

We would like to obtain your opinion on these two questions:


1) We wanna append possibilities into Postgres engine, and wanna get top 
estimation for
size of code, cost and time of implementation.
1.1) We divide possibilities to elementary features, find analogues in already 
written
code, and suppose e.g., that quantity of lines for 'create timer' will be 
similar to
'create function', and that implementation for 'create timer' is easy than 
implementation
of 'create function' (because it already has prototype in 'create function', 
and coping
source code is possiblle)
1.2) We calculate cost and time by COCOMO http://en.wikipedia.org/wiki/Cocomo

How relevant is this estimation ?


2) We are captivated by price of Indians,
we listened much about low quality of code, written by Indians,
we are fearing, that American company will resale implementation to Indian 
subcontractor
(i.e. real developers will be Indians anyway).

What requirements should satisfy code, written by Indians, to be in next 
version of Postgres ?



Dmitry (SQL50, HTML60)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code


On Thu, 2008-12-11 at 11:29 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Thu, 2008-12-11 at 09:44 +0200, Heikki Linnakangas wrote:
  Simon Riggs wrote:
  When the WAL starts streaming the *primary* can immediately perform
  synchronous replication, i.e. commit waits for transfer. 
  Until the standby has obtained all the missing log files, it's not 
  up-to-date, and there's no guarantee that it can finish the replay. For 
  example, imagine that your archive_command is an scp from the primary to 
  the standby. If a lightning strikes the primary before some WAL file has 
  been copied over to the archive directory in the standby, the standby 
  can't catch up. In the primary then, what's the point for a commit to 
  wait for transfer, if the reply from the standby doesn't guarantee that 
  the transaction is safe in the standby?
  
  The WAL files will have already left the primary. 
  
  Timeline is this in my understanding
  1 [Primary] Set up continuous archiving 
  2 [Primary] Take base backup
  3 [Standby] Connect to primary to initiate streaming
  4 [Primary] Log switch and, optionally, turn off archiving
  5 [Standby] Begin replaying files, initially from archive
  6 [Standby] Switch to replaying WAL records immediately after streaming
  
  So sync rep would turn on after step 4, so that all intermediate WAL
  files have been sent to the archive.  If we lose the Primary after this
  point then all transactions are accessible to standby. If we lose the
  Standby or Archive, then we need to replace them and re-run the above.
 
 Between steps 4 and 5, there's no guarantee that all WAL files generated 
 after step 3 and the start of streaming have already been archived. 
 There's a delay between writing a WAL file and when the file has been 
 safely archived. If you lose the primary during that window, the standby 
 will have old WAL files in the archive, the most recent ones in received 
 by walreceiver, but it's missing the WAL files generated just before the 
 switch to streaming mode.

I was presuming that the synchronisation was clear, but I'm sorry it
wasn't. Sync rep would begin only *after* the last WAL file was
archived.

  Recent changes I have requested in the architecture are:
  * making archiving optional on primary, so we don't need to send WAL
  data *twice*. 
 
 Agreed. I'm not so much worried about the bandwidth, but it's a lot of 
 extra work from administration point of view. It's very hard to get it 
 right, so that you eliminate windows like the above.
 
 As the patch stands, if you turn off archiving in the primary, and the 
 standby ever disconnects, even for only a few seconds, the standby will 
 miss any WAL generated until it reconnects, and without archiving 
 there's no way for the standby to get hold of the missed WAL.

I described earlier that archiving would turn back on again if the
replication ever failed (with correct synchronisation).

All I've asked for is the ability to turn on and turn back on archiving,
yes, with synchronisation so its safe. 

Personally, I think people will laugh if we tell them we decided to ship
all the data twice and couldn't see another way. That's the kind of
thing people give presentations at PGcon about...

  * allowing streaming/startup process to work together via shared memory,
  to reduce average replication delay and improve performance
  * skip archiving/de-archiving step on standby because it's superfluous
  (all on this thread)
  
  All of those are fairly minor code changes, but reduce complexity of
  solution and significantly reduce the amount of copying of WAL files (3
  copy actions to/from archive removed without loss of robustness). I
  would have made the suggestions earlier but it wasn't until I saw the
  architecture diagrams that I understood the intention of the code.
 
 To make archiving optional in the primary, I don't see any other choice 
 than adding the capability for the standby to request arbitrary WAL 
 files from the primary, over the wire. 

I don't think that's the only or even a desirable way. We cannot allow a
build up of WAL files to occur on the primary.

Making archiving optional isn't the big deal you're saying it is.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code

2008-12-11 Thread Fujii Masao

Hi,

On Thu, Dec 11, 2008 at 7:09 PM, Simon Riggs [EMAIL PROTECTED] wrote:
  Recent changes I have requested in the architecture are:
  * making archiving optional on primary, so we don't need to send WAL
  data *twice*.

 Agreed. I'm not so much worried about the bandwidth, but it's a lot of
 extra work from administration point of view. It's very hard to get it
 right, so that you eliminate windows like the above.

 As the patch stands, if you turn off archiving in the primary, and the
 standby ever disconnects, even for only a few seconds, the standby will
 miss any WAL generated until it reconnects, and without archiving
 there's no way for the standby to get hold of the missed WAL.

 I described earlier that archiving would turn back on again if the
 replication ever failed (with correct synchronisation).

 All I've asked for is the ability to turn on and turn back on archiving,
 yes, with synchronisation so its safe.

 Personally, I think people will laugh if we tell them we decided to ship
 all the data twice and couldn't see another way. That's the kind of
 thing people give presentations at PGcon about...


OK, I will add such archiving feature. My new design of archiving is as follows.

Primary
--
I extend archive_mode as follows and make the user be able to choose the
archiving strategy on the primary.

- always
  The primary always archives the WAL. This is compatible with current (=8.3)
  archive_mode = on.

- none
  The primary always doesn't archive the WAL. This is compatible with current
  archive_mode = off.

- standalone
  The primary doesn't archive the WAL only during replication. If replication is
  not in progress, the primary archives the WAL. That is, the primary switches
  the modes whenever replication starts / ends.

  [FLS-SLS]
  When replication starts, the primary disable archiving *after* the switched
  WAL file is archived. WAL streaming doesn't need to wait for disablement
  of archiving, so the processing on the primary isn't blocked by starting of
  replication. But, both WAL streaming and archiving would be in progress
  for a while (until the switched WAL file is archived) after
replication starts.

  [SLS-FLS]
  When replication starts, the primary restarts archiving immediately. This
  also doesn't block the processing on the primary. But, this might cause
  loss of some files from an archive if archiving is slow on the standby.
  The primary should look for the last archived file (by the standby) from
  an archive and restart archiving from the subsequent file? Of course,
  the primary cannot archive it if it's already removed on the primary.

Standby
---
I would add new option for achiving during recovery into recovery.conf
(recovery_archive_mode). Though this option is similar to archive_mode,
merging them would confuse the user more, I think. Or, I should merge?
And, do you want to configure the archive command only for recovery?
If so, I would add new option to specify the archive command during
recovery (recovery_archive_command).

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code

2008-12-11 Thread Fujii Masao

Hi,

On Thu, Dec 11, 2008 at 7:09 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 On Thu, 2008-12-11 at 11:29 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Thu, 2008-12-11 at 09:44 +0200, Heikki Linnakangas wrote:
  Simon Riggs wrote:
  When the WAL starts streaming the *primary* can immediately perform
  synchronous replication, i.e. commit waits for transfer.
  Until the standby has obtained all the missing log files, it's not
  up-to-date, and there's no guarantee that it can finish the replay. For
  example, imagine that your archive_command is an scp from the primary to
  the standby. If a lightning strikes the primary before some WAL file has
  been copied over to the archive directory in the standby, the standby
  can't catch up. In the primary then, what's the point for a commit to
  wait for transfer, if the reply from the standby doesn't guarantee that
  the transaction is safe in the standby?
 
  The WAL files will have already left the primary.
 
  Timeline is this in my understanding
  1 [Primary] Set up continuous archiving
  2 [Primary] Take base backup
  3 [Standby] Connect to primary to initiate streaming
  4 [Primary] Log switch and, optionally, turn off archiving
  5 [Standby] Begin replaying files, initially from archive
  6 [Standby] Switch to replaying WAL records immediately after streaming
 
  So sync rep would turn on after step 4, so that all intermediate WAL
  files have been sent to the archive.  If we lose the Primary after this
  point then all transactions are accessible to standby. If we lose the
  Standby or Archive, then we need to replace them and re-run the above.

 Between steps 4 and 5, there's no guarantee that all WAL files generated
 after step 3 and the start of streaming have already been archived.
 There's a delay between writing a WAL file and when the file has been
 safely archived. If you lose the primary during that window, the standby
 will have old WAL files in the archive, the most recent ones in received
 by walreceiver, but it's missing the WAL files generated just before the
 switch to streaming mode.

Yes, since such standby is unsafe, the user must not promote it to the primary.
Then, the user has to stop the standby (don't complete recovery), restart the
primary and restart the standby.


 I was presuming that the synchronisation was clear, but I'm sorry it
 wasn't. Sync rep would begin only *after* the last WAL file was
 archived.

Agreed. In order for the user to confirm whether replication began or
not, we might need to log the name of the switched WAL file.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code


On Wed, 2008-12-10 at 15:06 -0500, Aidan Van Dyk wrote:

 Call me think, but I'm confused... In sync rep, there *can't be* any
 catchign up do do... i.e. if the slave isn't accepting the WAL the
 master stops doing *anything*...

In normal/steady state, yes, you are correct. But there is more...

The simplest way to configure standby would be to freeze the primary
while we setup the standby and then go straight into normal/steady
state. That could mean hours of downtime for large databases, which is
unacceptable in a feature aimed at increasing availability. So we need
to allow the primary to continue working while the standby is setup.
That then creates a log gap between the LSN of the primary and the LSN
of the standby, which must be resolved.

So the catchup occurs during the transient initial phase when standby is
catching up with primary before they continue together in normal/steady
state. 

Most of the architectural discussion over last few months has been about
the need for the initial state and how to handle it. Most of the code
complexity also.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)


Bruce Momjian wrote:

Bruce Momjian wrote:

KaiGai Kohei wrote:

   CREATE TABLE t (
   a   int,
   b   text
   ) WITH (ROW_LEVEL_ACL=ON);


Let me outline the simplest API, assuming we are using table-level
granularity for the security columns.

CREATE TABLE would support

WITH (ROWACL = TRUE/FALSE);


 And then in postgresql.conf we would have:

default_with_rowacl

Yes, I agree it.

But SE-PostgreSQL does not need its table option to control
its availability per table granuality due to its security model.

Database ACL is a kind of DAC. It allows resource owners to
set up its access rights. In other hand, SE-PostgreSQL is an
implementation of MAC. It does not allow owners to control its
access rights. This is the role of centralized security policy,


When SE-Linux is enabled, CREATE TABLE would issue an error if SECEXT
was false.  I can't think of a clean way to guarantee that existing
tables have SECEXT though, which means we might need to have a missing
'security_context' column mean default SE-Linux permissions.


SE-PostgreSQL stores its security context on the security field of
HeapTupleHeader and set HEAP_HASSECURITY of t_infomask.
The security system column is always available, so it does not make
any matter. When no guest is available on PGACE, HEAP_HASSECURITY of
t_infomask is not set, so security field is not allocated and NULL
bitmask is not polluted.

 If we assume users set up Row-level ACLs for specific tables, per-table
 option is meaningful for reduction of NULL-bitmap space in the tuple
 without any NULL-values on general columns.

 Right.  I was hoping there was a way to have HEAP_HASSECACL control if
 the value is present or not.

 I sure wish others were adding ideas to this discussion.

I have a plan to add a new field (declared as int2 relrowacl) into
pg_class to show what column stores its Row-level ACLs.
When we create a table with (ROWACL=TRUE), it implicitly add a column
declared as security_acl aclitem[], and its attribute number is
stored within the pg_class.relrowacl. If it has positive value,
tuples within the table can have its individual ACLs. No-ACL is
represented via the NULL-bitmap. If it is zero, the table does not
have the security_acl column, and the row-level controls are simply
ignored.

Thanks,
--
KaiGai Kohei [EMAIL PROTECTED]

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

2008-12-11 Thread Jonah H. Harris

On Thu, Dec 11, 2008 at 4:43 AM, Dmitry Turin
[EMAIL PROTECTED] wrote:
 We would like to obtain your opinion on these two questions:

This is the wrong place to do it.

 2) We are captivated by price of Indians,
 we listened much about low quality of code, written by Indians,
 we are fearing, that American company will resale implementation to Indian 
 subcontractor
 (i.e. real developers will be Indians anyway).

Did you really just say that?

 (SQL50, HTML60)

Because it seems that you haven't got the hint yet, I'll just say it
frankly: No one really cares about your desired additions to Postgres.

-Jonah

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code


On Thu, 2008-12-11 at 19:19 +0900, Fujii Masao wrote:
 
  All I've asked for is the ability to turn on and turn back on archiving,
  yes, with synchronisation so its safe.
 (snip)

 OK, I will add such archiving feature. My new design of archiving is as 
 follows.
 
 Primary
 --
 I extend archive_mode as follows and make the user be able to choose the
 archiving strategy on the primary.
 
 - always
   The primary always archives the WAL. This is compatible with current (=8.3)
   archive_mode = on.
 
 - none
   The primary always doesn't archive the WAL. This is compatible with current
   archive_mode = off.
 
 - standalone
   The primary doesn't archive the WAL only during replication. If replication 
 is
   not in progress, the primary archives the WAL. That is, the primary switches
   the modes whenever replication starts / ends.
 
   [FLS-SLS]
   When replication starts, the primary disable archiving *after* the switched
   WAL file is archived. WAL streaming doesn't need to wait for disablement
   of archiving, so the processing on the primary isn't blocked by starting of
   replication. But, both WAL streaming and archiving would be in progress
   for a while (until the switched WAL file is archived) after
 replication starts.

I'm OK with that, but that is slightly different from what Heikki had
said in relation to the point at which sync rep begins on primary, so he
may have a different view.

synchronous_replication means if we a standby server has connected to
us we will wait for all WAL associated with a transaction to be
transferred prior to commit. So there is never a 100% guarantee that
the transaction is safe, just an if possible, 100%.

So this implements the equivalent of DRBD Protocol A and B. Do we have
an option to allow the WALreceiver to fsync the WAL file after a commit
is received, which would make it equivalent to Protocol C? If we don't,
I'm OK with that since it reduces performance so much it isn't a
practical option in many cases. 
http://www.drbd.org/users-guide/s-replication-protocols.html

   [SLS-FLS]
   When replication starts, the primary restarts archiving immediately. This
   also doesn't block the processing on the primary. But, this might cause
   loss of some files from an archive if archiving is slow on the standby.
   The primary should look for the last archived file (by the standby) from
   an archive and restart archiving from the subsequent file? Of course,
   the primary cannot archive it if it's already removed on the primary.

Standby will always have kept enough files to allow it to restart from
the last restartpoint, so a gap in the file sequence is unlikely. As
long as we archive the WAL file that contains the last LSN we
transferred before streaming failed. That conceivably might mean we need
to write a .ready message after a WAL file filled, which might mean we
have problems if the replication timeout is longer than the checkpoint
timeout, but that seems an unlikely configuration. And if anybody has a
problem with that we just recommend they use the always mode.

 Standby
 ---
 I would add new option for achiving during recovery into recovery.conf
 (recovery_archive_mode). Though this option is similar to archive_mode,
 merging them would confuse the user more, I think. Or, I should merge?
 And, do you want to configure the archive command only for recovery?
 If so, I would add new option to specify the archive command during
 recovery (recovery_archive_command).

I think if you really want two archives or archiving during recovery
then this is desirable to avoid confusion.

Explaining all this in the docs will be fun. :-)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visibility maps

2008-12-11 Thread Zdenek Kotala


Heikki Linnakangas napsal(a):

Pavan Deolasee wrote:

/*
 * We don't need to lock the page, as we're only looking at a single
bit.
 */
result = (map[mapByte]  (1  mapBit)) ? true : false;


Isn't this a dangerous assumption to make ? I am not so sure that even 
a bit
can be read atomically on all platforms. 


Umm, what non-atomic state could the bit be in? Half-set, half-cleared? 
Or do you think that if some other bit in proximity is changed, the 
other bit would temporarily flip 0-1-0, or something like that? I 
don't think that should happen.




IIRC, Memory reading/writing is atomic operation. Only one CPU(hw thread) can 
access to the same memory address(es)* in same time*. The question is how 
compiler compile C code to assembler.  But this code seems to me safe. I think 
we use same principle for TransactionID?



Zdenek

* Wide is architecture dependent. and of course it depends on alignment. I'm not 
sure how x86 CPUs read nonalignment memory. But if you enable this on SPARC it 
is handled in software thru TRAP handlers and it is not really atomic. But in 
our case we use byte access which is safe everywhere.


** IIRC, some AMD64 processors allows to disable cache coherence check, but it 
not used in standard OS, and we can ignore it.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Refactoring SearchSysCache + HeapTupleIsValid


Our code contains about 200 copies of the following code:

tuple = SearchSysCache[Copy](FOOOID, ObjectIdGetDatum(fooid), 0, 0, 0);
if (!HeapTupleIsValid(tuple))
elog(ERROR, cache lookup failed for foo %u, fooid);

This only counts elog() calls, not user-facing error messages 
constructed with ereport().


Shouldn't we try to refactor this, maybe like this:

HeapTuple
SearchSysCache[Copy]Oid(int cacheId, Oid key)
{
HeapTuple tuple;

tuple = SearchSysCache[Copy](cacheId, ObjectIdGetDatum(key),
 0, 0, 0);
if (!HeapTupleIsValid(tuple))
elog(ERROR, cache lookup failed in cache %d (relation %u) for 
OID %u,

 cacheId, cacheinfo[cacheId].reloid, key);

return tuple;
}

Maybe some other verb than Search could be used to make it clearer 
that this function has its own error handler.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

2008-12-11 Thread Sreejesh O S

On Thu, Dec 11, 2008 at 3:13 PM, Dmitry Turin 
[EMAIL PROTECTED] wrote:

 Hi, Pgsql-hackers.

 We would like to obtain your opinion on these two questions:


 1) We wanna append possibilities into Postgres engine, and wanna get top
 estimation for
 size of code, cost and time of implementation.
 1.1) We divide possibilities to elementary features, find analogues in
 already written
 code, and suppose e.g., that quantity of lines for 'create timer' will be
 similar to
 'create function', and that implementation for 'create timer' is easy than
 implementation
 of 'create function' (because it already has prototype in 'create
 function', and coping
 source code is possiblle)
 1.2) We calculate cost and time by COCOMO
 http://en.wikipedia.org/wiki/Cocomo

 How relevant is this estimation ?


 2) We are captivated by price of Indians,
 we listened much about low quality of code, written by Indians,

How do you identify that the code written by Indians is low in quality ? Do
you subcontracted to an Indian company or so ?


 we are fearing, that American company will resale implementation to Indian
 subcontractor
 (i.e. real developers will be Indians anyway).



 What requirements should satisfy code, written by Indians, to be in next
 version of Postgres ?



 Dmitry (SQL50, HTML60)



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

2008-12-11 Thread Ibrar Ahmed

Do we really need this kind of discussion here?


On Thu, Dec 11, 2008 at 4:43 PM, Sreejesh O S [EMAIL PROTECTED] wrote:


 On Thu, Dec 11, 2008 at 3:13 PM, Dmitry Turin
 [EMAIL PROTECTED] wrote:

 Hi, Pgsql-hackers.

 We would like to obtain your opinion on these two questions:


 1) We wanna append possibilities into Postgres engine, and wanna get top
 estimation for
 size of code, cost and time of implementation.
 1.1) We divide possibilities to elementary features, find analogues in
 already written
 code, and suppose e.g., that quantity of lines for 'create timer' will be
 similar to
 'create function', and that implementation for 'create timer' is easy than
 implementation
 of 'create function' (because it already has prototype in 'create
 function', and coping
 source code is possiblle)
 1.2) We calculate cost and time by COCOMO
 http://en.wikipedia.org/wiki/Cocomo

 How relevant is this estimation ?


 2) We are captivated by price of Indians,
 we listened much about low quality of code, written by Indians,

 How do you identify that the code written by Indians is low in quality ? Do
 you subcontracted to an Indian company or so ?

 we are fearing, that American company will resale implementation to Indian
 subcontractor
 (i.e. real developers will be Indians anyway).


 What requirements should satisfy code, written by Indians, to be in next
 version of Postgres ?



 Dmitry (SQL50, HTML60)



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers





-- 
   Ibrar Ahmed
   EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

2008-12-11 Thread Sreejesh O S

On Thu, Dec 11, 2008 at 5:24 PM, Ibrar Ahmed [EMAIL PROTECTED] wrote:

 Do we really need this kind of discussion here?

Dont know. But the post would have contained more specific matters.



 On Thu, Dec 11, 2008 at 4:43 PM, Sreejesh O S [EMAIL PROTECTED]
 wrote:
 
 
  On Thu, Dec 11, 2008 at 3:13 PM, Dmitry Turin
  [EMAIL PROTECTED] wrote:
 
  Hi, Pgsql-hackers.
 
  We would like to obtain your opinion on these two questions:
 
 
  1) We wanna append possibilities into Postgres engine, and wanna get top
  estimation for
  size of code, cost and time of implementation.
  1.1) We divide possibilities to elementary features, find analogues in
  already written
  code, and suppose e.g., that quantity of lines for 'create timer' will
 be
  similar to
  'create function', and that implementation for 'create timer' is easy
 than
  implementation
  of 'create function' (because it already has prototype in 'create
  function', and coping
  source code is possiblle)
  1.2) We calculate cost and time by COCOMO
  http://en.wikipedia.org/wiki/Cocomo
 
  How relevant is this estimation ?
 
 
  2) We are captivated by price of Indians,
  we listened much about low quality of code, written by Indians,
 
  How do you identify that the code written by Indians is low in quality ?
 Do
  you subcontracted to an Indian company or so ?
 
  we are fearing, that American company will resale implementation to
 Indian
  subcontractor
  (i.e. real developers will be Indians anyway).
 
 
  What requirements should satisfy code, written by Indians, to be in next
  version of Postgres ?
 
 
 
  Dmitry (SQL50, HTML60)
 
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 
 



 --
   Ibrar Ahmed
   EnterpriseDB   http://www.enterprisedb.com

Re: [HACKERS] COCOMO Indians

Hi, Jonah and Ibrar.

 This is the wrong place to do it.

Seggest other place.

 2) We are captivated by price of Indians,
 we listened much about low quality of code, written by Indians,
 we are fearing, that American company will resale implementation to Indian 
 subcontractor
 (i.e. real developers will be Indians anyway).
 Did you really just say that?

My letters are perlustrated by bank security command (team).

 it seems that you haven't got the hint yet

Already done. Current results is the following:
A1) cost in US is above cost in India in 3 time (i.e. coefficient is 3),
coefficient was obtained as expert estimation independently (of digits below).
A2) avarage in India is nearly 50 000$, avarage in US is nearly 150 000$.
Much dispersion (range) of prices (dispersion for India is bigger).
Both country informs, that cost in Europe is above cost in US.
A3) Contributor from Pg informs, that cost is 50 000 euro, i.e. 75 000$.

We interpret in following way:
B1) We can pack into 75 000, if we will seach developers enough long time,
we can get developers just now, if we pay twice (i.e. 75*2=150)
B2) therefore cost in India is increased twice, i.e. actual cost is 25 000
B3) We have no reasons to believe US, probably real developers will be in India

Do we really need this kind of discussion here?

Bosses do not like dispersion.
Besides this, 50 000 itself seems very much for India to be math expectation.



Dmitry (SQL50, HTML60)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

 This is the wrong place to do it.
 Seggest other place.

No.  This topic is off-topic for the mailing list.  When someone
brings up an issue that is off-topic, it is not our job to find them
another place to discuss it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

Hi, Robert.

 This is the wrong place to do it.
 Seggest other place.
 This topic is off-topic for the mailing list.

I feel strong desire to recall you, that Pg has no concreate instructions till 
now
how to hire for appending features into engine.

And of cource, there is no usefull recomendations to orientate on market.

P.S.
I agree to continue discussion about features of site only in other mailing 
list,
if you just want.


Dmitry (SQL50, HTML60)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] plpgsql: numeric assignment to an integer variable errors out

2008-12-11 Thread Nikhil Sontakke

The following plpgsql function errors out with cvs head:

CREATE function test_assign() returns void
AS
$$ declare x int;
BEGIN
x := 9E3/2;
END
$$ LANGUAGE 'plpgsql';

postgres=# select test_assign();
ERROR:  invalid input syntax for integer: 4500.
CONTEXT:  PL/pgSQL function test_assign line 3 at assignment

We do have an existing cast from numeric to type integer. But here basically
we convert the value to string in exec_cast_value before calling int4in. And
then use of strtol in pg_atoi leads to this complaint. Guess converting the
value to string is not always a good strategy.

Regards,
Nikhils
-- 
http://www.enterprisedb.com

Re: [HACKERS] COCOMO Indians

Hello Dmitry,

you are really on wrong place.

You have to accept, so your proposals are not interesting for this
community. Because PostgreSQL is under BSD licence, you can do own
project based on PostgreSQL source code. There you can test all your
ideas - it should not be first similar project -  but you cannot
expect so you will get some active programmers here - first you have
to show some real product, real project, and then you can expect some
interest. Without it, nobody will be respect you and accept your
proposals.

Regards
Pavel Stehule




2008/12/11 Dmitry Turin [EMAIL PROTECTED]:
 Hi, Jonah and Ibrar.

 This is the wrong place to do it.

 Seggest other place.

 2) We are captivated by price of Indians,
 we listened much about low quality of code, written by Indians,
 we are fearing, that American company will resale implementation to Indian 
 subcontractor
 (i.e. real developers will be Indians anyway).
 Did you really just say that?

 My letters are perlustrated by bank security command (team).

 it seems that you haven't got the hint yet

 Already done. Current results is the following:
 A1) cost in US is above cost in India in 3 time (i.e. coefficient is 3),
 coefficient was obtained as expert estimation independently (of digits below).
 A2) avarage in India is nearly 50 000$, avarage in US is nearly 150 000$.
 Much dispersion (range) of prices (dispersion for India is bigger).
 Both country informs, that cost in Europe is above cost in US.
 A3) Contributor from Pg informs, that cost is 50 000 euro, i.e. 75 000$.

 We interpret in following way:
 B1) We can pack into 75 000, if we will seach developers enough long time,
 we can get developers just now, if we pay twice (i.e. 75*2=150)
 B2) therefore cost in India is increased twice, i.e. actual cost is 25 000
 B3) We have no reasons to believe US, probably real developers will be in 
 India

Do we really need this kind of discussion here?

 Bosses do not like dispersion.
 Besides this, 50 000 itself seems very much for India to be math expectation.



 Dmitry (SQL50, HTML60)



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

Hi, Pavel.

 you have to show some real product, real project

Money will not be confirmed, until size of it will be known.

No IT solution can be confirmed or not, but business solution.
Budget for implementation is part of business solution.



Dmitry (SQL50, HTML60)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visibility maps

On Thu, Dec 11, 2008 at 5:01 PM, Zdenek Kotala [EMAIL PROTECTED] wrote:



 IIRC, Memory reading/writing is atomic operation. Only one CPU(hw thread)
 can access to the same memory address(es)* in same time*. The question is
 how compiler compile C code to assembler.  But this code seems to me safe.

Yeah, I think the code is safe because we are just reading a bit.

BTW, I wonder if we need to acquire EXCLUSIVE lock while writing the
visibility map bit ? Since almost (8 * 8192) data blocks would map to
the same visibility map page, the lock can certainly become a hot
spot. I know we also update PageLSN during the set operation and that
would require EXLUSIVE lock, but is that required for consistency
given that the entire visibility map is just a hint ?

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring SearchSysCache + HeapTupleIsValid

Peter Eisentraut [EMAIL PROTECTED] writes:
 Our code contains about 200 copies of the following code:
 tuple = SearchSysCache[Copy](FOOOID, ObjectIdGetDatum(fooid), 0, 0, 0);
 if (!HeapTupleIsValid(tuple))
  elog(ERROR, cache lookup failed for foo %u, fooid);
 ...
 Shouldn't we try to refactor this, maybe like this:

I can't get excited about it, and I definitely do not like your
suggestion of embedding particular assumptions about the lookup keys
into the API.  What you've got here is a worse error message and a
recipe for proliferation of ad-hoc wrappers around SearchSysCache,
in return for saving a couple of lines per call site.

If we could just move the error into SearchSysCache it might be worth
doing, but I think there are callers that need the flexibility to not
fail.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visibility maps

2008-12-11 Thread Zdenek Kotala


Pavan Deolasee napsal(a):

On Thu, Dec 11, 2008 at 5:01 PM, Zdenek Kotala [EMAIL PROTECTED] wrote:

IIRC, Memory reading/writing is atomic operation. Only one CPU(hw thread)
can access to the same memory address(es)* in same time*. The question is
how compiler compile C code to assembler.  But this code seems to me safe.


Yeah, I think the code is safe because we are just reading a bit.

BTW, I wonder if we need to acquire EXCLUSIVE lock while writing the
visibility map bit ? 


Yes, because it is not simple write operation. You need to read byte from memory 
to register, set bit and write it back. Write memory itself is atomic but 
somebody could change other bits between read and write.


Zdenek


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visibility maps

On Thu, Dec 11, 2008 at 7:03 PM, Zdenek Kotala [EMAIL PROTECTED] wrote:


 Yes, because it is not simple write operation. You need to read byte from
 memory to register, set bit and write it back. Write memory itself is atomic
 but somebody could change other bits between read and write.


Yeah, but since its just a hint, we can possibly live with some corner
cases. The benefit of avoiding contention on the VM page would easily
out weigh the downside of wrong hints.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring SearchSysCache + HeapTupleIsValid

2008-12-11 Thread Alvaro Herrera

Tom Lane wrote:

 If we could just move the error into SearchSysCache it might be worth
 doing, but I think there are callers that need the flexibility to not
 fail.

Pass a boolean flag?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visibility maps


Pavan Deolasee wrote:

On Thu, Dec 11, 2008 at 5:01 PM, Zdenek Kotala [EMAIL PROTECTED] wrote:

IIRC, Memory reading/writing is atomic operation. Only one CPU(hw thread)
can access to the same memory address(es)* in same time*. The question is
how compiler compile C code to assembler.  But this code seems to me safe.


Yeah, I think the code is safe because we are just reading a bit.


Right. I wonder if we should declare the char *map variable as volatile, 
though. Shouldn't make a difference in practice, it's only used once in 
the function, but it feels like the right thing to do given that it is 
accessing a piece of memory without a lock.



BTW, I wonder if we need to acquire EXCLUSIVE lock while writing the
visibility map bit ? Since almost (8 * 8192) data blocks would map to
the same visibility map page, the lock can certainly become a hot
spot. I know we also update PageLSN during the set operation and that
would require EXLUSIVE lock, but is that required for consistency
given that the entire visibility map is just a hint ?


Yeah, if we accept that bits can be bogusly set. There is scenarios 
where that can happen already, but they involve crashing, not during 
normal operation and clean shut down. In the future, I'd like to move in 
the direction of making the visibility map *more* reliable, not less, 
ultimately allowing index-only-scans, so I'd rather not start relaxing that.


Only the first update to a page needs to clear the bit in the visibility 
map, so I don't think it'll become a bottleneck in practice. Frequently 
updated pages will never have the bit set in the visibility map to begin 
with.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visibility maps

On Thu, Dec 11, 2008 at 7:15 PM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:


 Yeah, if we accept that bits can be bogusly set. There is scenarios where
 that can happen already, but they involve crashing, not during normal
 operation and clean shut down. In the future, I'd like to move in the
 direction of making the visibility map *more* reliable, not less, ultimately
 allowing index-only-scans, so I'd rather not start relaxing that.


Do we have any tests to prove that the VM page lock does not indeed
become a bottleneck ? I can do some if we don't have already.

 Only the first update to a page needs to clear the bit in the visibility
 map, so I don't think it'll become a bottleneck in practice. Frequently
 updated pages will never have the bit set in the visibility map to begin
 with.


Well that's true only if you reject my heap-prune patch :-) Otherwise,
heap-prune will again set the bit (and I believe that's the right
thing to do)

Thanks,
Pavan



-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

2008/12/11 Dmitry Turin [EMAIL PROTECTED]:
 Hi, Pavel.

 you have to show some real product, real project

 Money will not be confirmed, until size of it will be known.

 No IT solution can be confirmed or not, but business solution.
 Budget for implementation is part of business solution.

I though some different, First you have to show real code, some
working prototype.

And if you really would to collaborate with as, you have to accept so:

develop process is based on smaller proposals, every proposal is
discussed and have to be accepted. If nobody discuss with you, it's
signal, so there isn't real interest. So you have to open own open
source project.

Pavel




 Dmitry (SQL50, HTML60)



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visibility maps

Pavan Deolasee [EMAIL PROTECTED] writes:
 On Thu, Dec 11, 2008 at 5:01 PM, Zdenek Kotala [EMAIL PROTECTED] wrote:
 IIRC, Memory reading/writing is atomic operation. Only one CPU(hw thread)
 can access to the same memory address(es)* in same time*. The question is
 how compiler compile C code to assembler.  But this code seems to me safe.

 Yeah, I think the code is safe because we are just reading a bit.

There's no such thing as just reading a bit from shared memory.
Yes, you will get *some* value, but it is not very clear which value.

In particular, on machines with weak memory ordering guarantees
(PPC for instance), we put memory fence instructions into the
lock/unlock sequences to ensure that someone who obtains a lock
guarding a shared-memory data structure will see any changes made
by the previous holder of the lock.  An access that is entirely
free of any locking primitive might get a stale value --- meaning
that it might be logically inconsistent with the apparent contents
of other parts of shared memory examined just before or after this
access.

It's likely that there are other lock/unlock operations somewhere in the
code that would prevent a visible failure; and in any case the usage of
the visibility map is constrained in a way that means getting a slightly
stale value isn't a problem.  But it needs a lot closer analysis than
the existing code comment suggests.

I've been thinking for awhile that maybe we should expose the memory
fence operations as separate primitives, similar to what's done inside
the Linux kernel.  This code would feel a lot safer if it had a read
fence just before the fetch.  IIRC there are some other places where
we could use something similar instead of needing a full lock
acquisition/release.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code

* Simon Riggs [EMAIL PROTECTED] [081211 05:45]:
 
 On Wed, 2008-12-10 at 15:06 -0500, Aidan Van Dyk wrote:
 
  Call me think, but I'm confused... In sync rep, there *can't be* any
  catchign up do do... i.e. if the slave isn't accepting the WAL the
  master stops doing *anything*...
 
 In normal/steady state, yes, you are correct. But there is more...
 
 The simplest way to configure standby would be to freeze the primary
 while we setup the standby and then go straight into normal/steady
 state. That could mean hours of downtime for large databases, which is
 unacceptable in a feature aimed at increasing availability. So we need
 to allow the primary to continue working while the standby is setup.
 That then creates a log gap between the LSN of the primary and the LSN
 of the standby, which must be resolved.
 
 So the catchup occurs during the transient initial phase when standby is
 catching up with primary before they continue together in normal/steady
 state. 

But catchup *has* to be *done* before PostgreSQL can enter sync rep.

So, if I start PostgreSQL in sync rep mode, without any capable clients
to rep with  But I'ld rather be buggered there then find out tonight
at 3am that it was in sync rep mode but wasn't really doing sync rep,
becus I'ld messed up something somewhere (firewall, config, password,
anything) and ther ewas not caught up client at the time, and I've
just lost a days' worth of my $ transactions...

 Most of the architectural discussion over last few months has been about
 the need for the initial state and how to handle it. Most of the code
 complexity also.

Well, for me, I'm quite happy with a restart/stopstart being a
necessary downtime to move to synchronous replication.  This way, I
could see a setup routing that looks like:
1) Current production DB does normal backups/PITR/WAL archiving
2) I setup new slave, which involves
   - restore from backup + wal recover (pg_standby type)
   - Could take days+++
   - Oh well
3) Stop production
4) so, now slave is caught up...
5) Start production now in sync rep mode as master
6) start slave in sync-rep mode as slave...

So downtime would be limited to the time from the old postmaster
shutdown to the time the slave has replayed the last WAL and connected
to the restarted postmaster as a sync rep slave...

Or am I way too naive to think that a small downtime to switch from
non-sync-rep to sync-rep is acceptable...

a.
-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] COCOMO Indians

Hi, Pavel.

 Money will not be confirmed, until size of it will be known.
 I though some different, First you have to show real code

show read code before hiring

 develop process is based

I'm not intersting process of development in pgsql-hackers@postgresql.org now -
i'm interesting method to estimate lines and time.
If you don't like my proposals, you can imply other proposals behind.
I assume, PMs can be not here (in list), as well as (nominal) PMs here can not 
use methods
and work at random.

Let's leave my SQL5, and come back to topic:
  1.1) We divide possibilities to elementary features, find analogues in 
already written
code, and suppose e.g., that quantity of lines for 'create timer' will be 
similar to
'create function', and that implementation for 'create timer' is easy than 
implementation
of 'create function' (because it already has prototype in 'create function', 
and coping
source code is possiblle)
  1.2) We calculate cost and time by COCOMO http://en.wikipedia.org/wiki/Cocomo
How relevant is this estimation ?


Dmitry (SQL50, HTML60)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code

* Fujii Masao [EMAIL PROTECTED] [081211 05:25]:
 
 - standalone
   The primary doesn't archive the WAL only during replication. If replication 
 is
   not in progress, the primary archives the WAL. That is, the primary switches
   the modes whenever replication starts / ends.

That scares the hebegebies out of me... I'm doing sync-rep because I
*really* *want* *my* *data*  *always* ...

I want sync-rep because I'm going to get even *stonger* guarentees on my
data (and, if hot-standby works out, load balancing too, but thats not
*my* primary desire for sync-rep)...

But I'm sure as hell *not* going to throw all my eggs into that slave's
basket and do away with my WAL archive...  Would anyone actually use
that standby mode, and if not, why compilcate the code for it?

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

[HACKERS] Re[2+]: COCOMO Indians

Hi, Pavel.

Let me replace We by Somebody for your comprehension.

 Money will not be confirmed, until size of it will be known.
 I though some different, First you have to show real code

show read code before hiring

 develop process is based

I'm not intersting process of development in pgsql-hackers@postgresql.org now -
i'm interesting method to estimate lines and time.
If you don't like my proposals, you can imply other proposals behind.
I assume, PMs can be not here (in list), as well as (nominal) PMs here can not 
use methods
and work at random.

Let's leave my SQL5, and come back to topic:
  1.1) Somebody divide possibilities to elementary features, find analogues in 
already written
code, and suppose e.g., that quantity of lines for 'create timer' will be 
similar to
'create function', and that implementation for 'create timer' is easy than 
implementation
of 'create function' (because it already has prototype in 'create function', 
and coping
source code is possiblle)
  1.2) Somebody calculate cost and time by COCOMO 
http://en.wikipedia.org/wiki/Cocomo
How relevant is this estimation ?


Dmitry (SQL50, HTML60)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

KaiGai Kohei wrote:
 Bruce Momjian wrote:
  Bruce Momjian wrote:
  KaiGai Kohei wrote:
 CREATE TABLE t (
 a   int,
 b   text
 ) WITH (ROW_LEVEL_ACL=ON);
  
  Let me outline the simplest API, assuming we are using table-level
  granularity for the security columns.
  
  CREATE TABLE would support
  
  WITH (ROWACL = TRUE/FALSE);
 
   And then in postgresql.conf we would have:
  
  default_with_rowacl
 
 Yes, I agree it.
 
 But SE-PostgreSQL does not need its table option to control
 its availability per table granuality due to its security model.
 
 Database ACL is a kind of DAC. It allows resource owners to
 set up its access rights. In other hand, SE-PostgreSQL is an
 implementation of MAC. It does not allow owners to control its
 access rights. This is the role of centralized security policy,

It is fine if you require SECEXT to be on for SE-Linux, but the option
must be available for non-SE-Linux so you can load dumps from either
Postgres configuration, and /data is compatible with both versions.

  When SE-Linux is enabled, CREATE TABLE would issue an error if SECEXT
  was false.  I can't think of a clean way to guarantee that existing
  tables have SECEXT though, which means we might need to have a missing
  'security_context' column mean default SE-Linux permissions.
 
 SE-PostgreSQL stores its security context on the security field of
 HeapTupleHeader and set HEAP_HASSECURITY of t_infomask.
 The security system column is always available, so it does not make
 any matter. When no guest is available on PGACE, HEAP_HASSECURITY of
 t_infomask is not set, so security field is not allocated and NULL
 bitmask is not polluted.

If you make an SE-Linux dump with security fields, how will that be
loadable in a non-SE-Linux Postgres database?

We are also going to need ALTER TABLE to be able to add/remove these
columns from tables, like OIDs.

   If we assume users set up Row-level ACLs for specific tables, per-table
   option is meaningful for reduction of NULL-bitmap space in the tuple
   without any NULL-values on general columns.
  
   Right.  I was hoping there was a way to have HEAP_HASSECACL control if
   the value is present or not.
  
   I sure wish others were adding ideas to this discussion.
 
 I have a plan to add a new field (declared as int2 relrowacl) into
 pg_class to show what column stores its Row-level ACLs.
 When we create a table with (ROWACL=TRUE), it implicitly add a column
 declared as security_acl aclitem[], and its attribute number is
 stored within the pg_class.relrowacl. If it has positive value,
 tuples within the table can have its individual ACLs. No-ACL is
 represented via the NULL-bitmap. If it is zero, the table does not
 have the security_acl column, and the row-level controls are simply
 ignored.

I am confused why we would want this instead of the way we do oids.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visibility maps


Pavan Deolasee wrote:

On Thu, Dec 11, 2008 at 7:15 PM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:
Do we have any tests to prove that the VM page lock does not indeed
become a bottleneck ?


No.


I can do some if we don't have already.


Oh, yes please!


Only the first update to a page needs to clear the bit in the visibility
map, so I don't think it'll become a bottleneck in practice. Frequently
updated pages will never have the bit set in the visibility map to begin
with.


Well that's true only if you reject my heap-prune patch :-) Otherwise,
heap-prune will again set the bit (and I believe that's the right
thing to do)


I'm not sure if we should set the bits in very aggressively. If we're 
more aggressive about setting the bits, it also means that we have to 
clear the bits more often, increasing the likelihood of contention that 
you were worried about. Also, skipping a few pages here and there in 
vacuum doesn't make it any faster in practice, because you're reading 
sequentially. You need long contiguous regions of pages that can be 
skipped until you see a benefit.


Setting the PD_ALL_VISIBLE flag on the heap page itself more 
aggressively might be more interesting, because of the small seqscan 
optimization.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: default values for function parameters

Pavel Stehule wrote:
 2008/12/10 Pavel Stehule [EMAIL PROTECTED]:
  2008/12/10 Tom Lane [EMAIL PROTECTED]:
  Pavel Stehule [EMAIL PROTECTED] writes:
  next argument - if we accept AS for param names, then we introduce
  nonconsistent behave with SQL/XML functions.
 
  select xmlforest(c1, c2 as foo, c3) -- there foo isn't doesn't mean
  use it as param foo,
 
  It could be read as meaning that, I think.
 
  In any case, I'm not wedded to using AS for this, and am happy to
  consider other suggestions.
 
 what do you thing about?
 
 select fce(p1,p2,p3, SET paramname1 = val, paramname2 = val)
 
 example
 select dosome(10,20,30, SET flaga = true, flagb = false)

I think AS read more naturally because you expect the parameter to come
first, not the SET keyword.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: default values for function parameters

Pavel Stehule [EMAIL PROTECTED] writes:
 what do you thing about?

 select fce(p1,p2,p3, SET paramname1 = val, paramname2 = val)

I'm not really seeing any redeeming social value in that.  It's more
keystrokes than the other; and if you dislike AS because of possible
confusion with other usages then surely the same objection applies to
SET.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code


On Thu, 2008-12-11 at 09:27 -0500, Aidan Van Dyk wrote:

 But catchup *has* to be *done* before PostgreSQL can enter sync rep.

Not true. Please reread the thread where Heikki questions that and I
reply. This was Fujii-san's idea, which I now agree with.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code


On Thu, 2008-12-11 at 09:37 -0500, Aidan Van Dyk wrote:
 * Fujii Masao [EMAIL PROTECTED] [081211 05:25]:
  
  - standalone
The primary doesn't archive the WAL only during replication. If
 replication is
not in progress, the primary archives the WAL. That is, the
 primary switches
the modes whenever replication starts / ends.

 But I'm sure as hell *not* going to throw all my eggs into that
 slave's
 basket and do away with my WAL archive...  Would anyone actually use
 that standby mode, and if not, why compilcate the code for it?

Sending data twice is not a requirement I ever heard expressed, nor has
the lack of ability to send it twice been voiced as a criticism for any
form of replication I'm familiar with. Ask the DRBD guys if sending data
twice is necessary or required to make replication work.

If multiple people think its a good idea then I respect your choice of
option.

But I also think that many or perhaps most people will choose not to
send data twice and I respect that choice of option also.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] RE: [HACKERS] Updates of SE-PostgreSQL 8. 4devel patches (r1268)

2008-12-11 Thread Zeugswetter Andreas OSB sIT


   Ah, that is a good point, that if we have security column which is
   usually null then we are requiring the NULL bitmask.

Yes, I think that would not be optimal, thus I think WITH SECURITY_CONTEXT
is needed.
 
 I sure wish others were adding ideas to this discussion.

One such idea would be, that the security info is already normalized.
pg_security has one row for each security_context. It is my understanding, that
such a context row may already be a combination of rights. Thus adding an 
extra column
per subsystem to the user tables may not be required.

You could have all info for each security subsystem in the pg_security table. 
This can eighter be done by having one row in pg_security per
subsystem type and oid, or by having a separate column in pg_security per 
subsystem.

The imho difficult part is, that currently selecting security_context 
defaults to mapping the 
oid to the text representation for selinux. Concern has already been voiced in 
this regard.
Maybe this is another reason to not do automatic mapping, but require a 
specified conversion
for text output.

Or is the column name security_context and representation a standard ?

This is just an idea, since I do not really think actually using more than one 
security subsystem in parallel will be common.

Andreas
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

Zeugswetter Andreas OSB sIT wrote:
 
Ah, that is a good point, that if we have security column which is
usually null then we are requiring the NULL bitmask.
 
 Yes, I think that would not be optimal, thus I think WITH
 SECURITY_CONTEXT is needed.
 
  I sure wish others were adding ideas to this discussion.
 
 One such idea would be, that the security info is already
 normalized.  pg_security has one row for each security_context.
 It is my understanding, that such a context row may already be
 a combination of rights. Thus adding an extra column per
 subsystem to the user tables may not be required.  
 You could have all info for each security subsystem in the
 pg_security table.  This can eighter be done by having one row
 in pg_security per subsystem type and oid, or by having a separate
 column in pg_security per subsystem.
 
 The imho difficult part is, that currently selecting security_context
 defaults to mapping the oid to the text representation for
 selinux. Concern has already been voiced in this regard.  Maybe
 this is another reason to not do automatic mapping, but require
 a specified conversion for text output.
 
 Or is the column name security_context and representation a
 standard ?
 
 This is just an idea, since I do not really think actually using
 more than one security subsystem in parallel will be common.

We already have this.

The idea is that the security columns will hold an OID and the OID will
point to a row in a table that contains the security rights/ACL for the
column, with multiple rows using the same rights OID.  If you change the
rights on the column the code has to check the existing entries and add
a new one if it doesn't already exist.  This does add the problem of how
to remove security rows that are no longer referenced.

--
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: default values for function parameters

2008-12-11 Thread Kevin Grittner

 Tom Lane [EMAIL PROTECTED] wrote: 
 In any case, I'm not wedded to using AS for this, and am happy to
 consider other suggestions.  But = isn't acceptable.
 
How about using a bare equals sign (or the = characters) for
parameter assignment, but require that the parameter name be prefixed
with some special character?  (My first thought was a dollar sign, but
that would cause problems in PL/pgSQL, so some other character would
need to be used.)   It seems like that could give the parser enough
context to consider the operator as parameter assignment, so it
wouldn't require making it a fully reserved word or preclude other
uses of the operator.
 
I guess it would preclude the use of whatever character was chosen as
a prefix operator in the context of a parameter list, however; which
might be a fatal flaw to the idea.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code

* Simon Riggs [EMAIL PROTECTED] [081211 10:03]:
 
 Sending data twice is not a requirement I ever heard expressed, nor has
 the lack of ability to send it twice been voiced as a criticism for any
 form of replication I'm familiar with. Ask the DRBD guys if sending data
 twice is necessary or required to make replication work.
 
 If multiple people think its a good idea then I respect your choice of
 option.
 
 But I also think that many or perhaps most people will choose not to
 send data twice and I respect that choice of option also.

Well, PostgreSQL has WAL, so we've already accepted the notion of send
data twice being useful sometimes...

But I would note that the archive and streaming are both sending the
data *different* places... or at least, in my case would be...

And, also, I know WAL archiving isn't necessary for replication to work.
but it's necessary for me to sleep comfortably at night ;-)

I'm just suprised that people are willing to throw away their
backup/PITR archiving once they have a singl live slave up.

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

Bruce Momjian [EMAIL PROTECTED] writes:
 Let me outline the simplest API, assuming we are using table-level
 granularity for the security columns.
 CREATE TABLE would support
   WITH (ROWACL = TRUE/FALSE);
 for row-level acl and:
   WITH (SECEXT = TRUE/FALSE);
 for SE-Linux, with 'SECEXTL' standing for SECurity EXTernal or
 SECurity_contEXT.

Wait a minute.  The original argument for providing SQL-driven row level
security was that it would help provide a framework for testing the code
and doing something useful with it on non-selinux platforms.  Now we
seem to be proposing two independent implementations --- which, even
if similar, could still suffer different bugs (due to copy-and-pasteos
if nothing else).  So the testing argument goes right out the window.
Also, this is getting even further afield from any capability that
anyone actually asked for.

I think there should be only *one* underlying column and that it should
be manipulable by either SQL commands or selinux.  Otherwise you're
making a lie of the primary argument for having the SQL feature at all.

It's possible that some people would want to insist that only selinux
be used to manipulate the settings, but I think that could be addressed
by a compile-time option to disable the SQL commands.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: default values for function parameters

2008/12/11 Tom Lane [EMAIL PROTECTED]:
 Pavel Stehule [EMAIL PROTECTED] writes:
 what do you thing about?

 select fce(p1,p2,p3, SET paramname1 = val, paramname2 = val)

 I'm not really seeing any redeeming social value in that.  It's more
 keystrokes than the other; and if you dislike AS because of possible
 confusion with other usages then surely the same objection applies to
 SET.


true, it's nothing nice. There is only small set of short keyword

Zdenek Kotala's proposals is using

$name = value, ...
but I afraid so it could do some problems with prepared statements in future :(

Pavel


regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

KaiGai Kohei [EMAIL PROTECTED] writes:
 I have a plan to add a new field (declared as int2 relrowacl) into
 pg_class to show what column stores its Row-level ACLs.

If you want the column to be hidden in the same way that system
columns are, I'm afraid this is a pretty bad idea.  There are way too
many places that would require weird hacks to keep from treating a
positive-numbered column as a regular user column.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

2008-12-11 Thread Hannu Krosing

On Thu, 2008-12-11 at 15:20 +0200, Dmitry Turin wrote:
 Hi, Pavel.
 
  you have to show some real product, real project
 
 Money will not be confirmed, until size of it will be known.
 
 No IT solution can be confirmed or not, but business solution.
 Budget for implementation is part of business solution.

Pavel, you are most likely talking to a robot.

I suspect that somebody is trying a Turing test on us :P


-
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code

* Heikki Linnakangas [EMAIL PROTECTED] [081211 10:09]:
 Simon Riggs wrote:
 On Thu, 2008-12-11 at 09:27 -0500, Aidan Van Dyk wrote:

 But catchup *has* to be *done* before PostgreSQL can enter sync rep.

 Not true. Please reread the thread where Heikki questions that and I
 reply. This was Fujii-san's idea, which I now agree with.

 I think the confusion here is about what exactly sync rep means in  
 this situation. It's true that you can start streaming the WAL before  
 the standby has fully caught up. But from the client's point of view,  
 there's not much point in streaming the log *synchronously* and making  
 the client to wait for the acknowledment from the standby, if the  
 acknowledgment from the standby that WAL has be streamed up to point X,  
 doesn't actually guarantee that the slave can recover all the way to  
 that point.

Quite possibly a terminology problem.. I my case I said sync rep
meaning the mode such that the transaction doesn't commit successfully
for my PG client until the xlog record has been streamed to the
client... and I understand that at his presentation at PGcon, Fujii-san
there could be possible variants on when the streamed is considered
done based on network, slave ram, disk, application, etc.

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] WIP: default values for function parameters

2008/12/11 Kevin Grittner [EMAIL PROTECTED]:
 Tom Lane [EMAIL PROTECTED] wrote:
 In any case, I'm not wedded to using AS for this, and am happy to
 consider other suggestions.  But = isn't acceptable.

 How about using a bare equals sign (or the = characters) for
 parameter assignment, but require that the parameter name be prefixed
 with some special character?  (My first thought was a dollar sign, but
 that would cause problems in PL/pgSQL, so some other character would
 need to be used.)   It seems like that could give the parser enough
 context to consider the operator as parameter assignment, so it
 wouldn't require making it a fully reserved word or preclude other
 uses of the operator.

maybe this combination should be safe

$name =  or $name - ...

it's not used everywhere

Pavel


 I guess it would preclude the use of whatever character was chosen as
 a prefix operator in the context of a parameter list, however; which
 might be a fatal flaw to the idea.

 -Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code


On Thu, 2008-12-11 at 17:07 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Thu, 2008-12-11 at 09:27 -0500, Aidan Van Dyk wrote:
  
  But catchup *has* to be *done* before PostgreSQL can enter sync rep.
  
  Not true. Please reread the thread where Heikki questions that and I
  reply. This was Fujii-san's idea, which I now agree with.
 
 I think the confusion here is about what exactly sync rep means in 
 this situation. It's true that you can start streaming the WAL before 
 the standby has fully caught up. 

Yep.

 But from the client's point of view, 
 there's not much point in streaming the log *synchronously* and making 
 the client to wait for the acknowledment from the standby, if the 
 acknowledgment from the standby that WAL has be streamed up to point X, 
 doesn't actually guarantee that the slave can recover all the way to 
 that point.

I disagree. This morning I showed it was possible, given the
synchronisation I outlined.

There is a slight relaxation of that in the current proposal, so you
need to take that up if you see any problem there.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

 But SE-PostgreSQL does not need its table option to control
 its availability per table granuality due to its security model.

 Database ACL is a kind of DAC. It allows resource owners to
 set up its access rights. In other hand, SE-PostgreSQL is an
 implementation of MAC. It does not allow owners to control its
 access rights. This is the role of centralized security policy,
 
 It is fine if you require SECEXT to be on for SE-Linux, but the option
 must be available for non-SE-Linux so you can load dumps from either
 Postgres configuration, and /data is compatible with both versions.

It is unclear for me why you thought this option is necessary to load
dumps generated by vanilla PostgreSQL?
I assumes the security column should be always available, so we don't
need to add a column when SE-PostgreSQL load a vanilla $PGDATA.

 When SE-Linux is enabled, CREATE TABLE would issue an error if SECEXT
 was false.  I can't think of a clean way to guarantee that existing
 tables have SECEXT though, which means we might need to have a missing
 'security_context' column mean default SE-Linux permissions.
 SE-PostgreSQL stores its security context on the security field of
 HeapTupleHeader and set HEAP_HASSECURITY of t_infomask.
 The security system column is always available, so it does not make
 any matter. When no guest is available on PGACE, HEAP_HASSECURITY of
 t_infomask is not set, so security field is not allocated and NULL
 bitmask is not polluted.
 
 If you make an SE-Linux dump with security fields, how will that be
 loadable in a non-SE-Linux Postgres database?

In this case, user should no dump his database with security field.
The patched pg_dump does not dump security field without
'--security-context' option.

 We are also going to need ALTER TABLE to be able to add/remove these
 columns from tables, like OIDs.

I don't agree.
The security column (not a acl column) should be always exist.
In the vanilla binary, it simply returns NULL or empty string, so there is
no waste of storage consumption.
It is worthful when we run SE- binary with $PGDATA generated by vanilla one.
Any stored tuple does not have security field, but it gives DBAs a chance to
relabel them via the security column.

   If we assume users set up Row-level ACLs for specific tables, per-table
   option is meaningful for reduction of NULL-bitmap space in the tuple
   without any NULL-values on general columns.
  
   Right.  I was hoping there was a way to have HEAP_HASSECACL control if
   the value is present or not.
  
   I sure wish others were adding ideas to this discussion.

 I have a plan to add a new field (declared as int2 relrowacl) into
 pg_class to show what column stores its Row-level ACLs.
 When we create a table with (ROWACL=TRUE), it implicitly add a column
 declared as security_acl aclitem[], and its attribute number is
 stored within the pg_class.relrowacl. If it has positive value,
 tuples within the table can have its individual ACLs. No-ACL is
 represented via the NULL-bitmap. If it is zero, the table does not
 have the security_acl column, and the row-level controls are simply
 ignored.
 
 I am confused why we would want this instead of the way we do oids.

It enables to implement the hardcoded row-acl more simple.
It allows to store variable length ACLs using existing mechanism, and
makes unnecessary to translate between ACLs and raw text representation.


Thanks,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-11 Thread Alvaro Herrera

Bruce Momjian wrote:

 The idea is that the security columns will hold an OID and the OID will
 point to a row in a table that contains the security rights/ACL for the
 column, with multiple rows using the same rights OID.  If you change the
 rights on the column the code has to check the existing entries and add
 a new one if it doesn't already exist.  This does add the problem of how
 to remove security rows that are no longer referenced.

How will it search for existing entries?  Are you saying that it will
seqscan the whole catalog of entries, looking for a match?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Function with default value not replacing old definition of the function

Pavel Stehule [EMAIL PROTECTED] writes:
 no, it's little bit different

 Default is only stored parameter value. You created two functions with
 two different signatures

 myfunc(int)
 myfunc(int, int)

Yeah, we already bit this bullet with variadic functions --- if you have
myfunc(int, float)
myfunc(int, variadic float[])
then it's ambiguous which one should be used for call myfunc(11, 12.5).
The sanest answer I can see is so, don't do that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep: First Thoughts on Code


Simon Riggs wrote:

On Thu, 2008-12-11 at 09:27 -0500, Aidan Van Dyk wrote:


But catchup *has* to be *done* before PostgreSQL can enter sync rep.


Not true. Please reread the thread where Heikki questions that and I
reply. This was Fujii-san's idea, which I now agree with.


I think the confusion here is about what exactly sync rep means in 
this situation. It's true that you can start streaming the WAL before 
the standby has fully caught up. But from the client's point of view, 
there's not much point in streaming the log *synchronously* and making 
the client to wait for the acknowledment from the standby, if the 
acknowledgment from the standby that WAL has be streamed up to point X, 
doesn't actually guarantee that the slave can recover all the way to 
that point.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)


At this point I am so confused I don't have any response.

---

KaiGai Kohei wrote:
  But SE-PostgreSQL does not need its table option to control
  its availability per table granuality due to its security model.
 
  Database ACL is a kind of DAC. It allows resource owners to
  set up its access rights. In other hand, SE-PostgreSQL is an
  implementation of MAC. It does not allow owners to control its
  access rights. This is the role of centralized security policy,
  
  It is fine if you require SECEXT to be on for SE-Linux, but the option
  must be available for non-SE-Linux so you can load dumps from either
  Postgres configuration, and /data is compatible with both versions.
 
 It is unclear for me why you thought this option is necessary to load
 dumps generated by vanilla PostgreSQL?
 I assumes the security column should be always available, so we don't
 need to add a column when SE-PostgreSQL load a vanilla $PGDATA.
 
  When SE-Linux is enabled, CREATE TABLE would issue an error if SECEXT
  was false.  I can't think of a clean way to guarantee that existing
  tables have SECEXT though, which means we might need to have a missing
  'security_context' column mean default SE-Linux permissions.
  SE-PostgreSQL stores its security context on the security field of
  HeapTupleHeader and set HEAP_HASSECURITY of t_infomask.
  The security system column is always available, so it does not make
  any matter. When no guest is available on PGACE, HEAP_HASSECURITY of
  t_infomask is not set, so security field is not allocated and NULL
  bitmask is not polluted.
  
  If you make an SE-Linux dump with security fields, how will that be
  loadable in a non-SE-Linux Postgres database?
 
 In this case, user should no dump his database with security field.
 The patched pg_dump does not dump security field without
 '--security-context' option.
 
  We are also going to need ALTER TABLE to be able to add/remove these
  columns from tables, like OIDs.
 
 I don't agree.
 The security column (not a acl column) should be always exist.
 In the vanilla binary, it simply returns NULL or empty string, so there is
 no waste of storage consumption.
 It is worthful when we run SE- binary with $PGDATA generated by vanilla one.
 Any stored tuple does not have security field, but it gives DBAs a chance to
 relabel them via the security column.
 
If we assume users set up Row-level ACLs for specific tables, per-table
option is meaningful for reduction of NULL-bitmap space in the tuple
without any NULL-values on general columns.
   
Right.  I was hoping there was a way to have HEAP_HASSECACL control if
the value is present or not.
   
I sure wish others were adding ideas to this discussion.
 
  I have a plan to add a new field (declared as int2 relrowacl) into
  pg_class to show what column stores its Row-level ACLs.
  When we create a table with (ROWACL=TRUE), it implicitly add a column
  declared as security_acl aclitem[], and its attribute number is
  stored within the pg_class.relrowacl. If it has positive value,
  tuples within the table can have its individual ACLs. No-ACL is
  represented via the NULL-bitmap. If it is zero, the table does not
  have the security_acl column, and the row-level controls are simply
  ignored.
  
  I am confused why we would want this instead of the way we do oids.
 
 It enables to implement the hardcoded row-acl more simple.
 It allows to store variable length ACLs using existing mechanism, and
 makes unnecessary to translate between ACLs and raw text representation.
 
 
 Thanks,

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-11 Thread Zeugswetter Andreas OSB sIT

 Ah, that is a good point, that if we have security 
 column which is
 usually null then we are requiring the NULL bitmask.
  
  Yes, I think that would not be optimal, thus I think WITH
  SECURITY_CONTEXT is needed.
  
   I sure wish others were adding ideas to this discussion.
  

  One such idea would be, that the security info is already
  normalized. 

I formulated that sentence badly , sorry :-(
Replace with:
Since the security info is already normalized, one such idea would be:  

  pg_security has one row for each security_context.
  It is my understanding, that such a context row may already be
  a combination of rights. Thus adding an extra column per
  subsystem to the user tables may not be required.  
  You could have all info for each security subsystem in the
  pg_security table.  This can eighter be done by having one row
  in pg_security per subsystem type and oid, or by having a separate
  column in pg_security per subsystem.
  
  The imho difficult part is, that currently selecting 
 security_context
  defaults to mapping the oid to the text representation for
  selinux. Concern has already been voiced in this regard.  Maybe
  this is another reason to not do automatic mapping, but require
  a specified conversion for text output.
  
  Or is the column name security_context and representation a
  standard ?
  
  This is just an idea, since I do not really think actually using
  more than one security subsystem in parallel will be common.
 
 We already have this.
 
 The idea is that the security columns will hold an OID and the OID will
 point to a row in a table that contains the security rights/ACL for the
 column, with multiple rows using the same rights OID.  If you change the
 rights on the column the code has to check the existing entries and add
 a new one if it doesn't already exist.  This does add the problem of how
 to remove security rows that are no longer referenced.

Please reread with above correction, 
and I'll also try a little differently:

Since a pg_security row already represents a combination of rights
within selinux, I do not really see why that cannot be extended to a 
combination
of rowacl and selinux rights or more general one oid represents a unique 
combination of rights within different subsystems ?

A simplified example of pg_security:
oid rights
1   selinux:secret_read rowacl:ra,rb
2   selinux:unlabeled_t rowacl:ra,rb
3   selinux:secret_read rowacl:ra

Andreas
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

 Bruce Momjian [EMAIL PROTECTED] writes:
 Let me outline the simplest API, assuming we are using table-level
 granularity for the security columns.
 CREATE TABLE would support
  WITH (ROWACL = TRUE/FALSE);
 for row-level acl and:
  WITH (SECEXT = TRUE/FALSE);
 for SE-Linux, with 'SECEXTL' standing for SECurity EXTernal or
 SECurity_contEXT.
 
 Wait a minute.  The original argument for providing SQL-driven row level
 security was that it would help provide a framework for testing the code
 and doing something useful with it on non-selinux platforms.

Yes,
In addition, I want folks to remind that the Row-level ACLs are not designed
based on SQL standards. Thus, I called it one of the enhanced securities.

 I think there should be only *one* underlying column and that it should
 be manipulable by either SQL commands or selinux.  Otherwise you're
 making a lie of the primary argument for having the SQL feature at all.
 
 It's possible that some people would want to insist that only selinux
 be used to manipulate the settings, but I think that could be addressed
 by a compile-time option to disable the SQL commands.

My original opinion is that users should be able to choose what enhanced
security mechanism is available on his system.
In all honesty, I don't understand why the Row-level ACLs has privileged
position in the enhanced securities and it should be always available.

Thanks,
--
KaiGai Kohei

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

Zeugswetter Andreas OSB sIT wrote:
 Please reread with above correction, 
 and I'll also try a little differently:
 
 Since a pg_security row already represents a combination of rights
 within selinux, I do not really see why that cannot be extended to a 
 combination
 of rowacl and selinux rights or more general one oid represents a unique 
 combination of rights within different subsystems ?
 
 A simplified example of pg_security:
 oid   rights
 1 selinux:secret_read rowacl:ra,rb
 2 selinux:unlabeled_t rowacl:ra,rb
 3 selinux:secret_read rowacl:ra

Yes, I suggested that but the patch author thought it would be better to
have two columns.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

2008/12/11 Hannu Krosing [EMAIL PROTECTED]:
 On Thu, 2008-12-11 at 15:20 +0200, Dmitry Turin wrote:
 Hi, Pavel.

  you have to show some real product, real project

 Money will not be confirmed, until size of it will be known.

 No IT solution can be confirmed or not, but business solution.
 Budget for implementation is part of business solution.

 Pavel, you are most likely talking to a robot.

 I suspect that somebody is trying a Turing test on us :P

please, ask Zdenek about my identity :)

My English is terrible. I forgot an base from school, and now, all my
reading is c source codes only.

Next year will be lot of free time, so I hope so my English will be
better. It's terrible when I have to explain some. But on basic school
I had to learn Russian (a in this time nobody likes Russian - so my
language skills are bad).

Pavel





 -
 Hannu



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring SearchSysCache + HeapTupleIsValid

On Thursday 11 December 2008 15:28:08 Tom Lane wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  Our code contains about 200 copies of the following code:
  tuple = SearchSysCache[Copy](FOOOID, ObjectIdGetDatum(fooid), 0, 0, 0);
  if (!HeapTupleIsValid(tuple))
   elog(ERROR, cache lookup failed for foo %u, fooid);
  ...
  Shouldn't we try to refactor this, maybe like this:

 I can't get excited about it, and I definitely do not like your
 suggestion of embedding particular assumptions about the lookup keys
 into the API.  What you've got here is a worse error message and a
 recipe for proliferation of ad-hoc wrappers around SearchSysCache,
 in return for saving a couple of lines per call site.

 If we could just move the error into SearchSysCache it might be worth
 doing, but I think there are callers that need the flexibility to not
 fail.

This is hardly ad hoc.  There are about 400 calls to SearchSysCache[Copy], and 
about 200 fit into the exact pattern I described.  Normally, I'd start 
refactoring at around 3 pieces of identical code.  But when 50% of all calls 
have an identical code around it, it is more of an interface failure.

What about the other convenience routines in syscache.h?  They have less 
calls combined than this proposal alone.

There are really two very natural ways to make a syscache search.  One, you 
get an object name from a user and look it up.  If it fails, it is probably a 
user error, and you go back to the user and explain it in detailed terms.  
Two, you get an OID reference from somewhere else in the system and look it 
up.  If it fails, you bail out because the internal state of the system is 
inconsistent.  Most uses fit nicely into these two categories (with the 
notable exception of dealing with pg_attribute, which already has its ad-hoc 
wrappers).  In fact, other uses would probably be suspicious.

About the error message, I find neither version to be very good.  People see 
these messages and don't know what to do.  Considering that users do see 
these supposedly-internal messages on occasion, we could design something 
much better like

ereport(ERROR,
(errmsg(syscache lookup failed for OID %u in cache %d for 
relation %s, ...),
 errdetail(This probably means the internal system catalogs or system 
caches are inconsistent.  Try to restart the session.  If the problem 
persists, report a bug.)));

But we should really only put this together if we can do it at a central 
place.

The problem with the idea of putting the error right into SearchSysCache(), 
possibly with a Boolean flag, is twofold:

1. It doesn't really match actual usage: either you look up an OID and want to 
fail, or you look up something else and want to handle the error yourself.  
You'd break the interface for no general benefit.

2. You can't really create good error messages if you have no informatioon 
about the type of the key.

Maybe someone has better ideas, but 200 copies of the same poor error message 
don't make sense to me.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

2008/12/11 Hannu Krosing [EMAIL PROTECTED]:
 On Thu, 2008-12-11 at 15:20 +0200, Dmitry Turin wrote:
 Hi, Pavel.

  you have to show some real product, real project

 Money will not be confirmed, until size of it will be known.

 No IT solution can be confirmed or not, but business solution.
 Budget for implementation is part of business solution.

 Pavel, you are most likely talking to a robot.

 I suspect that somebody is trying a Turing test on us :P

please, ask Zdenek about my identity :)

My English is terrible. I forgot an base from school, and now, all my
reading is c source codes only.

Next year will be lot of free time, so I hope so my English will be
better. It's terrible when I have to explain some. But on basic school
I had to learn Russian (a in this time nobody likes Russian - so my
language skills are bad).

Pavel





 -
 Hannu



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] posix_fadvise v22

2008-12-11 Thread Greg Stark

I'll send another path with at least 1 and 3 fixed and hunt around  
again for a header file to put this guc into.


On 10 Dec 2008, at 04:22, ITAGAKI Takahiro [EMAIL PROTECTED] 
 wrote:



Hello,

Gregory Stark [EMAIL PROTECTED] wrote:

Here's an update to eliminate two small bitrot conflicts.


I read your patch with interest, but found some trivial bad manners.

* LET_OS_MANAGE_FILESIZE is already obsoleted.
You don't have to cope with the option.


Huh I didn't realize that. I guess the idea is  that users just  
configure a very large segment size to get the old behaviour.





* Type mismatch in prefetch_pages
A variable prefetch_pages is defined as unsigned or int
in some places. Why don't you define it only once in a header
and include the header in source files?


Just... Which header?



* Assignment to prefetch_pages
What do +0.99 means here?
  [assign_io_concurrency()]
  +prefetch_pages = new_prefetch_pages+0.99;
You want to do as follows, right?
  +prefetch_pages = (int) ceil(new_prefetch_pages);


Sure


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

On Thursday 11 December 2008 04:52:51 Bruce Momjian wrote:
   We do have a per-row HEAP_HASOID bit, so I wonder if we can have a
   HEAP_HASSEC bit too.  Right now the HEAP_HASOID is controlled by the
   CREATE/ALTER table;
 
  The current patch add HEAP_HASSECURITY bit to t_infomask. :-)
  When it is false, its security field is not available and not allocated.

 Good.

This is probably OK, but if you want to save a bit or generalize it, it might 
be worth considering using the normal null bitmap and nullity everywhere 
instead of individual HEAP_HASTHISORTHAT bits for every feature.

Of course, if we expect that most rows will have no security information, this 
tradeoff might end up on the wrong side of the equation.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)


Bruce Momjian wrote:

At this point I am so confused I don't have any response.


Are you discussing the case when we start a PostgreSQL with $PGDATA
generated by different binary?

At first, please consider the case when we start SE-PostgreSQL with
$PGDATA generated by vanilla binary.
(1) In this case, any stored tuple is not set HEAP_HASSECURITY on its
t_infomask, so SE-PostgreSQL considers it as a unlabeled_t one.

(2) In addition, we assumed the security system column was always available
independent from compile-time option. Thus, existing table definition
has security system column within its pg_attribute entries.

(3) Thus, SE-PostgreSQL allows users to access security field of existing
tuples via the security column. Maybe, it returns unlabeled_t security
context, so we have to label it properly.

(4) In another point of view, how the security system column work in the
vanilla PostgreSQL? Because it does not available any guest on PGACE,
HEAP_HASSECURITY is not always set. It means we cannot set any value on
the security field via security system column, and reading the column
always returns dummy data (empty text).

In the reversed case, the vanilla PostgreSQL simply ignores security
field independent from HEAP_HASSECURITY. When it inserts/update a tuple,
it does not has its security field and lost its security attribute.
It is quite natural.

Does it help you to understand?


---

KaiGai Kohei wrote:

But SE-PostgreSQL does not need its table option to control
its availability per table granuality due to its security model.

Database ACL is a kind of DAC. It allows resource owners to
set up its access rights. In other hand, SE-PostgreSQL is an
implementation of MAC. It does not allow owners to control its
access rights. This is the role of centralized security policy,

It is fine if you require SECEXT to be on for SE-Linux, but the option
must be available for non-SE-Linux so you can load dumps from either
Postgres configuration, and /data is compatible with both versions.

It is unclear for me why you thought this option is necessary to load
dumps generated by vanilla PostgreSQL?
I assumes the security column should be always available, so we don't
need to add a column when SE-PostgreSQL load a vanilla $PGDATA.


When SE-Linux is enabled, CREATE TABLE would issue an error if SECEXT
was false.  I can't think of a clean way to guarantee that existing
tables have SECEXT though, which means we might need to have a missing
'security_context' column mean default SE-Linux permissions.

SE-PostgreSQL stores its security context on the security field of
HeapTupleHeader and set HEAP_HASSECURITY of t_infomask.
The security system column is always available, so it does not make
any matter. When no guest is available on PGACE, HEAP_HASSECURITY of
t_infomask is not set, so security field is not allocated and NULL
bitmask is not polluted.

If you make an SE-Linux dump with security fields, how will that be
loadable in a non-SE-Linux Postgres database?

In this case, user should no dump his database with security field.
The patched pg_dump does not dump security field without
'--security-context' option.


We are also going to need ALTER TABLE to be able to add/remove these
columns from tables, like OIDs.

I don't agree.
The security column (not a acl column) should be always exist.
In the vanilla binary, it simply returns NULL or empty string, so there is
no waste of storage consumption.
It is worthful when we run SE- binary with $PGDATA generated by vanilla one.
Any stored tuple does not have security field, but it gives DBAs a chance to
relabel them via the security column.


  If we assume users set up Row-level ACLs for specific tables, per-table
  option is meaningful for reduction of NULL-bitmap space in the tuple
  without any NULL-values on general columns.
 
  Right.  I was hoping there was a way to have HEAP_HASSECACL control if
  the value is present or not.
 
  I sure wish others were adding ideas to this discussion.

I have a plan to add a new field (declared as int2 relrowacl) into
pg_class to show what column stores its Row-level ACLs.
When we create a table with (ROWACL=TRUE), it implicitly add a column
declared as security_acl aclitem[], and its attribute number is
stored within the pg_class.relrowacl. If it has positive value,
tuples within the table can have its individual ACLs. No-ACL is
represented via the NULL-bitmap. If it is zero, the table does not
have the security_acl column, and the row-level controls are simply
ignored.

I am confused why we would want this instead of the way we do oids.

It enables to implement the hardcoded row-acl more simple.
It allows to store variable length ACLs using existing mechanism, and
makes unnecessary to translate between ACLs and raw text representation.


Thanks,





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

On Thursday 11 December 2008 17:43:38 KaiGai Kohei wrote:
 In addition, I want folks to remind that the Row-level ACLs are not
 designed based on SQL standards. Thus, I called it one of the enhanced
 securities.

We have a lot of things in our code that are nonstandard, beyond the standard, 
enhanced, or pretty cool.  We don't call any of those interfaces enhanced 
and lay out our code based on that.  So I don't reall buy the PGACE 
premise.  Either your code is so modular that it is a separate plugin, which 
is probably not appropriate here, or it is really built-in.  Someone, I 
forgot who, recently wrote, a patch should make the code look as if the patch 
had been in there all along.  Yes.  Clear internal interfaces are also great, 
but don't lay out your code to create hooks or interfaces just because your 
are afraid to paste your code in someone else's source file.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visibility maps

On Thu, Dec 11, 2008 at 8:09 PM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:


 I'm not sure if we should set the bits in very aggressively. If we're more
 aggressive about setting the bits, it also means that we have to clear the
 bits more often, increasing the likelihood of contention that you were
 worried about.

Well, I would rather set bits aggressively and reduce contention by
changing the lock type. If HOT is working well, VACUUM will have very
few things to do, but visibility map wouldn't help as much as it can
unless we set the bits after pruning.

Another thing I noticed is the since VACUUM tries to set the bit in
the first phase, it's working only because HOT prunes DEAD tuples just
before we do another scan on line pointers (which I had earlier talked
about getting rid of. May be its time I do that). Otherwise, the
visibility bit won't be set even though the DEAD tuples will be
removed in the second scan and the rest are all LIVE tuples. So if we
at all want to take out the another scan of line pointers from the
first pass, we should rather push the work setting bits in the prune
code.

Thanks,
Pavan

-- 

Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

2008-12-11 Thread Hannu Krosing

On Thu, 2008-12-11 at 16:48 +0100, Pavel Stehule wrote:
 2008/12/11 Hannu Krosing [EMAIL PROTECTED]:
  On Thu, 2008-12-11 at 15:20 +0200, Dmitry Turin wrote:
  Hi, Pavel.
 
   you have to show some real product, real project
 
  Money will not be confirmed, until size of it will be known.
 
  No IT solution can be confirmed or not, but business solution.
  Budget for implementation is part of business solution.
 
  Pavel, you are most likely talking to a robot.
 
  I suspect that somebody is trying a Turing test on us :P
 
 please, ask Zdenek about my identity :)

Actually I was thinking that Turin is a clever name for a Turing test
machine .

 My English is terrible. I forgot an base from school, and now, all my
 reading is c source codes only.
 
 Next year will be lot of free time, so I hope so my English will be
 better. It's terrible when I have to explain some. But on basic school
 I had to learn Russian (a in this time nobody likes Russian - so my
 language skills are bad).

--
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

 I think there should be only *one* underlying column and that it should
 be manipulable by either SQL commands or selinux.  Otherwise you're
 making a lie of the primary argument for having the SQL feature at all.

I agree that we're getting pretty far afield from the original
proposal, but I don't think it's a good idea to foreclose the option
of ever supporting MAC and DAC in the same executable.  Whichever one
the vendor decides to ship, I have to recompile if I want the other.
There's a good chance that most people will use NEITHER feature, but
it isn't nice if one of the two is easily available and the other is
much harder.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

On Thursday 11 December 2008 17:09:25 Tom Lane wrote:
 I think there should be only *one* underlying column and that it should
 be manipulable by either SQL commands or selinux.  Otherwise you're
 making a lie of the primary argument for having the SQL feature at all.

Well, an SQL-manipulated row security column will probably have a content like

{joe=rw/bob,staff=r/bob}

An SELinux-aware row security column will probably have a content like

   blah_t:foo_t:quux_t

And a Solaris TX-aware security column will probably have a content like

   Classified

How can we stick all of these in the same column at the same time?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COCOMO Indians

2008/12/11 Hannu Krosing [EMAIL PROTECTED]:
 On Thu, 2008-12-11 at 16:48 +0100, Pavel Stehule wrote:
 2008/12/11 Hannu Krosing [EMAIL PROTECTED]:
  On Thu, 2008-12-11 at 15:20 +0200, Dmitry Turin wrote:
  Hi, Pavel.
 
   you have to show some real product, real project
 
  Money will not be confirmed, until size of it will be known.
 
  No IT solution can be confirmed or not, but business solution.
  Budget for implementation is part of business solution.
 
  Pavel, you are most likely talking to a robot.
 
  I suspect that somebody is trying a Turing test on us :P

 please, ask Zdenek about my identity :)

 Actually I was thinking that Turin is a clever name for a Turing test
 machine .

oh, I pass to.

:) or some assertive man



 My English is terrible. I forgot an base from school, and now, all my
 reading is c source codes only.

 Next year will be lot of free time, so I hope so my English will be
 better. It's terrible when I have to explain some. But on basic school
 I had to learn Russian (a in this time nobody likes Russian - so my
 language skills are bad).

 --
 Hannu




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)


Peter Eisentraut wrote:

On Thursday 11 December 2008 17:09:25 Tom Lane wrote:

I think there should be only *one* underlying column and that it should
be manipulable by either SQL commands or selinux.  Otherwise you're
making a lie of the primary argument for having the SQL feature at all.


Well, an SQL-manipulated row security column will probably have a content like

{joe=rw/bob,staff=r/bob}

An SELinux-aware row security column will probably have a content like

   blah_t:foo_t:quux_t

And a Solaris TX-aware security column will probably have a content like

   Classified

How can we stick all of these in the same column at the same time?


To choose it on compile-time option is the most simple approach.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

On Thursday 11 December 2008 17:04:05 Bruce Momjian wrote:
 The idea is that the security columns will hold an OID and the OID will
 point to a row in a table that contains the security rights/ACL for the
 column, with multiple rows using the same rights OID.

That sounds somewhat scary for a number of reasons:

1. Running out of OIDs, the main reason why we got rid of OIDs in user tables 
by default.  This would essentially put them back.

2. You are implying some kind of ACL unification algorithm, to combine 
identical ACLs under one ID.  How will that work, and how will it be managed?

3. The performance impact of having to look somewhere else for every row 
fetched.  If you propose a cache, note that this cache has potentially one 
possible entry for every row in the database.  That would need significant 
thought and tuning.

4. Size scalability of the whole thing.  When using IDs as references is being 
proposed, somewhere in there is a total size limitation for a row-security 
enabled database.

Even if you manage to solve #2, is this cleanup feasible to run on a database 
that has run into the limits of #4?

I suppose that SELinux in the kernel addresses these issues somehow (e.g. 
caching), but what would the SQL-only solution do?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] posix_fadvise v22

Greg Stark [EMAIL PROTECTED] writes:
 A variable prefetch_pages is defined as unsigned or int
 in some places. Why don't you define it only once in a header
 and include the header in source files?

 Just... Which header?

MHO: the header that goes with the source file that is most concerned with
implementing the variable's behavior (which is also the file that should
have the actual variable definition).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

Peter Eisentraut [EMAIL PROTECTED] writes:
 On Thursday 11 December 2008 17:09:25 Tom Lane wrote:
 I think there should be only *one* underlying column and that it should
 be manipulable by either SQL commands or selinux. Â Otherwise you're
 making a lie of the primary argument for having the SQL feature at all.

 Well, an SQL-manipulated row security column will probably have a content like

 {joe=rw/bob,staff=r/bob}

 An SELinux-aware row security column will probably have a content like

blah_t:foo_t:quux_t

 And a Solaris TX-aware security column will probably have a content like

Classified

 How can we stick all of these in the same column at the same time?

Why would we want to?  I think one column that can hold any of these
ought to be sufficient.  I certainly don't care for the idea that we
might invent still a third column for Solaris TX at some future time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: default values for function parameters

On Thursday 11 December 2008 17:11:28 Pavel Stehule wrote:
 maybe this combination should be safe

 $name =  or $name - ...

 it's not used everywhere

Why don't you actually just implement the whole thing first using a random, 
simple, and nonconflicting syntax?

Adjusting the syntax to something we can reach consensus on should be a change 
of about at most 10 lines at the end.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

On Thursday 11 December 2008 18:24:54 KaiGai Kohei wrote:
 Peter Eisentraut wrote:
  On Thursday 11 December 2008 17:09:25 Tom Lane wrote:
  I think there should be only *one* underlying column and that it should
  be manipulable by either SQL commands or selinux.  Otherwise you're
  making a lie of the primary argument for having the SQL feature at all.
 
  Well, an SQL-manipulated row security column will probably have a content
  like
 
  {joe=rw/bob,staff=r/bob}
 
  An SELinux-aware row security column will probably have a content like
 
 blah_t:foo_t:quux_t
 
  And a Solaris TX-aware security column will probably have a content like
 
 Classified
 
  How can we stick all of these in the same column at the same time?

 To choose it on compile-time option is the most simple approach.

As mentioned before, compile-time options to choose between these variants in 
a mutually exlusive manner is not acceptable.

Plus, using two of these together, or even three, is certainly useful and 
reasonable in some uses.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring SearchSysCache + HeapTupleIsValid

2008-12-11 Thread Alvaro Herrera

Peter Eisentraut wrote:

 About the error message, I find neither version to be very good.  People see 
 these messages and don't know what to do.

I agree.  People see this:

ERROR: cache lookup failure for constraint 123123123

and they think it means the same as this:

ERROR: cache lookup failure for relation 456456456

The difference is subtle for people that are not -hackers regulars (I
had a case of this just yesterday); they immediately start to think
about temp tables and plpgsql functions, even with the first error
message, even when we say that we solved the underlying problem.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: default values for function parameters

2008/12/11 Peter Eisentraut [EMAIL PROTECTED]:
 On Thursday 11 December 2008 17:11:28 Pavel Stehule wrote:
 maybe this combination should be safe

 $name =  or $name - ...

 it's not used everywhere

 Why don't you actually just implement the whole thing first using a random,
 simple, and nonconflicting syntax?

 Adjusting the syntax to something we can reach consensus on should be a change
 of about at most 10 lines at the end.


this is done

I did it today, so I have workable WIP prototype for ADA(Oracle)
syntax. Change of some syntax will not be really problem.

Pavel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

Peter Eisentraut wrote:
 On Thursday 11 December 2008 17:04:05 Bruce Momjian wrote:
  The idea is that the security columns will hold an OID and the OID will
  point to a row in a table that contains the security rights/ACL for the
  column, with multiple rows using the same rights OID.
 
 That sounds somewhat scary for a number of reasons:
 
 1. Running out of OIDs, the main reason why we got rid of OIDs in user tables 
 by default.  This would essentially put them back.
 
 2. You are implying some kind of ACL unification algorithm, to combine 
 identical ACLs under one ID.  How will that work, and how will it be managed?
 
 3. The performance impact of having to look somewhere else for every row 
 fetched.  If you propose a cache, note that this cache has potentially one 
 possible entry for every row in the database.  That would need significant 
 thought and tuning.
 
 4. Size scalability of the whole thing.  When using IDs as references is 
 being 
 proposed, somewhere in there is a total size limitation for a row-security 
 enabled database.
 
 Even if you manage to solve #2, is this cleanup feasible to run on a database 
 that has run into the limits of #4?
 
 I suppose that SELinux in the kernel addresses these issues somehow (e.g. 
 caching), but what would the SQL-only solution do?

Agreed.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] posix_fadvise v22

2008-12-11 Thread Greg Stark

On Thu, Dec 11, 2008 at 4:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark greg.st...@enterprisedb.com writes:
 A variable prefetch_pages is defined as unsigned or int
 in some places. Why don't you define it only once in a header
 and include the header in source files?

 Just... Which header?

 MHO: the header that goes with the source file that is most concerned with
 implementing the variable's behavior (which is also the file that should
 have the actual variable definition).

Well the trick here is that the variable actually affects how many
PrefetchBuffer() calls *callers* should make. The callers are various
places which are doing lots of ReadBuffer calls and know what buffer's
they'll need in the future. The main places are in
nodeBitmapHeapScan.c and nbtsearch.c. Neither of those are remotely
relevant.

I think i'm settling in that it should be in the same place as the
PrefetchBuffer() prototype since anyone who needs prefetch_buffers
will need that as well (except for guc.c). So I'll put it in bufmgr.h
for now.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

On Thursday 11 December 2008 18:32:50 Tom Lane wrote:
  How can we stick all of these in the same column at the same time?

 Why would we want to?

Because we want to use SQL-based row access control and SELinux-based row 
access control at the same time.  Isn't this exactly one of the objections 
upthread?  Both must be available at the same time.

We can debate the merits of having, say, SELinux plus Solaris TX at the same 
time, but if we can have two as per previous paragraph, we should design for 
several.

 I think one column that can hold any of these 
 ought to be sufficient.  I certainly don't care for the idea that we
 might invent still a third column for Solaris TX at some future time.

Yes, it is certainly more appealing to have one column describing all access 
rights.

In fact, if we extend the ACL storage structure to store external access 
control information, we might also consider using that for system object 
access.  So instead of adding a column to pg_class for SELinux-controlled 
access to tables, we just reused relacl.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] benchmarking the query planner

Robert Haas robertmh...@gmail.com writes:
Ah, that makes sense. Here's a test case based on Greg's. This is
definitely more than linear once you get above about n = 80, but it's
not quadratic either. n = 1000 is only 43x n = 80, and while that's
surely more than 1000/80 = 12.5, it's also a lot less than (1000/80)^2
= 156.25.

Yeah, that's not too surprising. There's a linear cost associated with
fetching/deconstructing the stats array, and a quadratic cost in the
comparison loop. What these results say is that the upper limit of 1000
keeps you from getting into the range where the linear component is
completely negligible. If you plot the results and try to fit a curve
like c1*x^2 + c2*x to them, you get a very good fit for all the points
above about 80. Below that, the curve has a flatter slope, indicating
a smaller linear cost component. The values in this test are about 100
bytes each, so the breakpoint at 80 seems to correspond to what happens
when the stats array no longer fits in one page.

I replicated your test and got timings like these, using CVS HEAD with
cassert off:

10 1.587
20 1.997
30 2.208
40 2.499
50 2.734
60 3.048
70 3.368
80 3.706
90 4.686
100 6.418
150 10.016
200 13.598
250 17.905
300 22.777
400 33.471
500 46.394
600 61.185
700 77.664
800 96.304
900 116.615
1000140.117

So this machine is a bit slower than yours, but otherwise it's pretty
consistent with your numbers. I then modified the test to use

array[random()::text,random()::text,random()::text,random()::text,random()::text,random()::text]

ie, the same data except stuffed into an array. I did this because
I know that array_eq is pretty durn expensive, and indeed:

10 1.662
20 2.478
30 3.119
40 3.885
50 4.636
60 5.437
70 6.403
80 7.427
90 8.473
100 9.597
150 16.542
200 24.919
250 35.225
300 47.423
400 76.509
500 114.076
600 157.535
700 211.189
800 269.731
900 335.427
1000409.638

When looking at these numbers one might think the threshold of pain
is about 50, rather than 100 which is where I'd put it for the text
example. However, this is probably an extreme worst case.

On the whole I think we have some evidence here to say that upping the
default value of default_stats_target to 100 wouldn't be out of line,
but 1000 definitely is. Comments?

BTW, does anyone have an opinion about changing the upper limit for
default_stats_target to, say, 1? These tests suggest that you
wouldn't want such a value for a column used as a join key, but
I can see a possible argument for high values in text search and
similar applications.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

Peter Eisentraut pete...@gmx.net writes:
 On Thursday 11 December 2008 18:32:50 Tom Lane wrote:
 How can we stick all of these in the same column at the same time?
 
 Why would we want to?

 Because we want to use SQL-based row access control and SELinux-based row 
 access control at the same time.  Isn't this exactly one of the objections 
 upthread?  Both must be available at the same time.

Well, the objection I was raising is that they should control the same
thing.  Otherwise we are simply inventing an invasive, high-cost,
nonstandard(*) feature that we have had zero field demand for.

regards, tom lane

(*) Worse than nonstandard: it actively breaks semantics demanded by
the standard.  If I had my druthers we would flat out reject row-level
security filtering of any kind.  I don't want us to expend a lot of
effort implementing multiple kinds.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Let me outline the simplest API, assuming we are using table-level
  granularity for the security columns.
  CREATE TABLE would support
  WITH (ROWACL = TRUE/FALSE);
  for row-level acl and:
  WITH (SECEXT = TRUE/FALSE);
  for SE-Linux, with 'SECEXTL' standing for SECurity EXTernal or
  SECurity_contEXT.
 
 Wait a minute.  The original argument for providing SQL-driven row level
 security was that it would help provide a framework for testing the code
 and doing something useful with it on non-selinux platforms.  Now we
 seem to be proposing two independent implementations --- which, even
 if similar, could still suffer different bugs (due to copy-and-pasteos
 if nothing else).  So the testing argument goes right out the window.
 Also, this is getting even further afield from any capability that
 anyone actually asked for.

Yep, no question.  The two-column idea happened because ignoring the
rowacl value for SE-Linux seemed wrong, but I am fine with it.

 I think there should be only *one* underlying column and that it should
 be manipulable by either SQL commands or selinux.  Otherwise you're
 making a lie of the primary argument for having the SQL feature at all.

True.

 It's possible that some people would want to insist that only selinux
 be used to manipulate the settings, but I think that could be addressed
 by a compile-time option to disable the SQL commands.

Yes, an SE-Linux compile could throw errors for those commands.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] benchmarking the query planner

2008-12-11 Thread Vladimir Sitnikov



 BTW, does anyone have an opinion about changing the upper limit for
 default_stats_target to, say, 1?  These tests suggest that you
 wouldn't want such a value for a column used as a join key, but
 I can see a possible argument for high values in text search and
 similar applications.

Do you consider using hash tables?
I am not sure hash is a perfect match here, however I guess some kind of
data structure might improve N^2 behaviour. Looks like that would improve
both array_eq (that will narrow the list of possible arrays to the single
hash bucket) and large _target (I guess that would improve N^2 to N)

Regards,
Vladimir Sitnikov

Re: [HACKERS] benchmarking the query planner

Vladimir Sitnikov sitnikov.vladi...@gmail.com writes:
 Do you consider using hash tables?

Doubt it's really worth it, unless there's some way to amortize the
setup cost across multiple selectivity estimations; which would surely
complicate life.

One thing that just now occurred to me is that as long as we maintain
the convention that MCV lists are in decreasing frequency order, one can
take any prefix of the list and it's a perfectly good MCV list of less
resolution.  So one way to reduce the time taken in eqjoinsel is to set
an upper limit on the number of entries considered *by that routine*,
whereas other estimator functions could use larger lists.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] benchmarking the query planner

 When looking at these numbers one might think the threshold of pain
 is about 50, rather than 100 which is where I'd put it for the text
 example.  However, this is probably an extreme worst case.

 On the whole I think we have some evidence here to say that upping the
 default value of default_stats_target to 100 wouldn't be out of line,
 but 1000 definitely is.  Comments?

Do you think there's any value in making it scale based on the size of
the table?  How hard would it be?  If you made it MIN(10 + 0.001 *
estimated_rows, 100), you would probably get most of the benefit while
avoiding unnecessary overhead for small tables.

Otherwise, I am a bit concerned that 10 - 100 may be too big a jump
for one release, especially since it may cause the statistics to get
toasted in some cases, which comes with a significant performance hit.
 I would raise it to 30 or 50 and plan to consider raising it further
down the road.  (I realize I just made about a million enemies with
that suggestion.)

 BTW, does anyone have an opinion about changing the upper limit for
 default_stats_target to, say, 1?  These tests suggest that you
 wouldn't want such a value for a column used as a join key, but
 I can see a possible argument for high values in text search and
 similar applications.

I think that's a good idea.  Given that most people probably don't
both fiddling with this parameter at all, it doesn't strike me as much
of a foot-gun.  I think you'd need a heck of a big table to justify a
value in that range, but some people may have them.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Function with default value not replacing old definition of the function

2008-12-11 Thread Dimitri Fontaine


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Le 11 déc. 08 à 16:22, Tom Lane a écrit :
Yeah, we already bit this bullet with variadic functions --- if you  
have

myfunc(int, float)
myfunc(int, variadic float[])
then it's ambiguous which one should be used for call myfunc(11,  
12.5).

The sanest answer I can see is so, don't do that.



Is there any warning level message at CREATE FUNCTION time for the  
user/dba to know he's doing something... border line, almost shooting  
himself in the foot?


I'd really welcome such an error message as a reminder to consider  
seriously such a choice, which would not be though out in lot of cases  
I suppose.


Regards,
- --
dim




-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAklBaQoACgkQlBXRlnbh1bn0VgCeJB+cBxX1tg1Qgn+MYaW6hS8O
ZX8An3niWwN4lFIbwuBZJ8mKgTBThm6o
=d4lp
-END PGP SIGNATURE-

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] benchmarking the query planner

2008-12-11 Thread Vladimir Sitnikov



  Do you consider using hash tables?

 Doubt it's really worth it, unless there's some way to amortize the
 setup cost across multiple selectivity estimations; which would surely
 complicate life.

MCV lists are updated only during analyze phase, don't they? If the setup
cost is the cost of maintaining those hash tables, it is not going to
hurt much.




 One thing that just now occurred to me is that as long as we maintain
 the convention that MCV lists are in decreasing frequency order, one can
 take any prefix of the list and it's a perfectly good MCV list of less
 resolution.  So one way to reduce the time taken in eqjoinsel is to set
 an upper limit on the number of entries considered *by that routine*,
 whereas other estimator functions could use larger lists.

That makes sense, however, linear search for single item in the list of
10'000 elements could take a while. Hash lookup might be better choice.

Regards,
Vladimir Sitnikov

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

2008-12-11 Thread Gregory Stark

Peter Eisentraut pete...@gmx.net writes:

 On Thursday 11 December 2008 18:32:50 Tom Lane wrote:
  How can we stick all of these in the same column at the same time?

 Why would we want to?

 Because we want to use SQL-based row access control and SELinux-based row 
 access control at the same time.  Isn't this exactly one of the objections 
 upthread?  Both must be available at the same time.

Well I don't think anyone would actually want them *at the same time*.
Combining multiple security models would mean you aren't actually following
any security model.

But I don't like the idea of making it a compile-time switch. Having to ship
separate packages for different compile-time options is really an awful
solution from the distribution's point of view. And it doesn't scale either --
if we got another such option they would have 2^n combinations.

Distributions like to set distribution-wide policies like compile with X
support. It doesn't mean you can't run those programs without actually using
that support, as in emacs -nw. It would be nice to have the option at
run-time of whether to use selinux or row-acl support instead.

I think we need to separate out the --enable-selinux which would merely
compile in the support for selinux from the switch to control whether we
actually have selinux turned on. Make that either an initdb option or a
per-database option like we have with collation/encoding.

Then users can install a single package and decide whether they want to use
selinux or row-acls. If their distribution decides not to compile in selinux
support they just have one choice, row-acls (or nothing).

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)

* Gregory Stark st...@enterprisedb.com [081211 14:47]:
 Peter Eisentraut pete...@gmx.net writes:
 
  On Thursday 11 December 2008 18:32:50 Tom Lane wrote:
   How can we stick all of these in the same column at the same time?
 
  Why would we want to?
 
  Because we want to use SQL-based row access control and SELinux-based row 
  access control at the same time.  Isn't this exactly one of the objections 
  upthread?  Both must be available at the same time.
 
 Well I don't think anyone would actually want them *at the same time*.
 Combining multiple security models would mean you aren't actually following
 any security model.

Actually, I think people (or rather, systems) will.  No application is
going to want to use SE-linux, but OS controllers are...

Just like now, the actual people/apps/etc rely on plain unix
permissions, yet the distro still provides an SElinux policy that is
more restrictive yet...

Simlarly, SElinux is going to be used *on top* of any application that's
out there, to try and enfoce the no data coming in from a secure input
leaves through a less secure output, irrespective of what app level
security (and in this case, app-level being the SQL/SCHEMA/row-level)
does itself...

So, if row-level access comes to PG in any sql form, apps and others
will use it (if only a few of them)...  And se-linux on top of that will
be used to try and enforce that the app hasn't made a mistake...

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] benchmarking the query planner

On Thu, Dec 11, 2008 at 2:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Vladimir Sitnikov sitnikov.vladi...@gmail.com writes:
 Do you consider using hash tables?

 Doubt it's really worth it, unless there's some way to amortize the
 setup cost across multiple selectivity estimations; which would surely
 complicate life.

 One thing that just now occurred to me is that as long as we maintain
 the convention that MCV lists are in decreasing frequency order, one can
 take any prefix of the list and it's a perfectly good MCV list of less
 resolution.  So one way to reduce the time taken in eqjoinsel is to set
 an upper limit on the number of entries considered *by that routine*,
 whereas other estimator functions could use larger lists.

To what extent will that negate the benefit of having those statistics
in the first place?

Here's another idea.  If you have a  operator, you could use a
quicksort-type strategy to partition the search space.  Pick an
arbitrary element of either list and apply it to all elements of both
lists to divide the initial problem into two problems that are each
half as large.  When the subproblems fall below some size threshold,
then solve them according to the existing algorithm.  This is O(n^2)
in the worst case, just like quicksort, but the worst case is
difficult to construct.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Function with default value not replacing old definition of the function