date:20120306

On Mon, Mar 5, 2012 at 11:29 PM, Josh Berkus j...@agliodbs.com wrote:

 3. Pages with checksums set need to have a version marking to show
 that they are a later version of the page layout. That version number
 needs to be extensible to many later versions. Pages of multiple
 versions need to exist within the server to allow simple upgrades and
 migration.

 This is a statement of a problem; do you have a proposed solution for it?

Yes. (3) and (4) are in some ways related, so the solution was
presented further down the page.

 (3) and (4) are in conflict with each other, but there is a solution.
 We mark the block with a version number, but we don't make the
 checking dependant upon the version number. We simply avoid making any
 checks until the command to scan all blocks is complete, per point
 (2). That way we need to use 1 flag bit to mark the new version and
 zero flag bits to indicate checks should happen.

The proposed solution is extensible and currently would allow many new versions.

Thank you for your comments.

Are all the other aspects of the proposal acceptable, or are there
other options anybody would like to explore? We've discussed a range
of possible options, so the proposal can be varied as needed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] poll: CHECK TRIGGER?

Hello


 When I try to look on some multicheck form:

 a) CHECK FUNCTION ALL ON table_name
 b) CHECK TRIGGER ALL ON table_name

 then more natural form is @b (for me). Personally, I can live with
 one, both or second form, although I prefer CHECK TRIGGER.


I though some time more.

if somebody would to check all custom function, then he can write

CHECK FUNCTION ALL

what about triggers?

CHECK TRIGGER ALL

but if we don't implement CHECK TRIGGER, then this statement will look like

CHECK FUNCTION ALL ON ALL ???

and this is unclean - probably it doesn't mean - check trigger
function with any table. So this is other argument for CREATE TRIGGER.

Nice a day

Pavel


 notes?

 Regards

 Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Making TRUNCATE more MVCC-safe

2012-03-06 Thread Noah Misch

On Sun, Mar 04, 2012 at 01:02:57PM +, Simon Riggs wrote:
 More detailed thoughts show that the test in heap_beginscan_internal()
 is not right enough, i.e. wrong.
 
 We need a specific XidInMVCCSnapshot test on the relvalidxid, so it
 needs to be a specific xid, not an xmin because otherwise we can get
 concurrent transactions failing, not just older transactions.

Good point; I agree.  indcheckxmin's level of pessimism isn't appropriate for
this new check.

 If we're going freeze tuples on load this needs to be watertight, so
 some minor rework needed.
 
 Of course, if we only have a valid xid on the class we might get new
 columns added when we do repeated SELECT * statements using the same
 snapshot while concurrent DDL occurs. That is impractical, so if we
 define this as applying to rows it can work; if we want it to apply to
 everything its getting more difficult.

Excess columns seem less grave to me than excess or missing rows.  I'm having
difficulty thinking up an explanation for that opinion.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-06 Thread Albe Laurenz

Shigeru Hanada wrote:
[pgsql_fdw_v12.patch]

I know this is not the latest version, but I played around with it and
tickled a bug.
It seems to have a problem with rolled back subtransactions.

test= \d+ remote
  Foreign table laurenz.remote
Column |  Type   | Modifiers | FDW Options | Storage  | Description
   +-+---+-+--+-
id | integer | not null  | | plain|
val| text| not null  | | extended |
   Server: loopback
   FDW Options: (nspname 'laurenz', relname 'local')
   Has OIDs: no
test= BEGIN;
test= DECLARE x CURSOR FOR SELECT * FROM remote;
   DEBUG:  Remote SQL: SELECT id, val FROM laurenz.local
   DEBUG:  relid=16423 fetch_count=1
   DEBUG:  starting remote transaction with START TRANSACTION ISOLATION
LEVEL REPEATABLE READ
test= FETCH x;
id | val
   +-
 1 | one
   (1 row)
test= SAVEPOINT z;
test= ERROR OUT;
   ERROR:  syntax error at or near ERROR
   LINE 1: ERROR OUT;
test= ROLLBACK TO SAVEPOINT z;
test= FETCH x;
id | val
   +-
 2 | two
   (1 row)
test= COMMIT;
   ERROR:  could not close cursor
   DETAIL:  no connection to the server

   HINT:  CLOSE pgsql_fdw_cursor_0

The error message reported is not consistent, at one attempt the backend
crashed.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Making TRUNCATE more MVCC-safe

2012-03-06 Thread Noah Misch

On Mon, Mar 05, 2012 at 03:46:16PM -0500, Robert Haas wrote:
 On Mon, Mar 5, 2012 at 2:22 PM, Noah Misch n...@leadboat.com wrote:
  I can see this strategy applying to many relation-pertinent system catalogs.
  Do you foresee applications to non-relation catalogs?
 
 Well, in theory, we have similar issues if, say, a query uses a
 function that didn't exist at the time the snapshot as taken; the
 actual results the user sees may not be consistent with any serial
 execution schedule.  And the same could be true for any other SQL
 object.  It's unclear that those cases are as compelling as this one,
 but then again it's unclear that no one will ever want to fix them,
 either.  For example, suppose we have a view v over a table t that
 calls a function f.  Somebody alters f to give different results and,
 in the same transaction, modifies the contents of t (but no DDL).
 This doesn't strike me as a terribly unlikely scenario; the change to
 t could well be envisioned as a compensating transaction.  But now if
 somebody uses the new definition of f against the old contents of t,
 the user may fail to get what they were hoping for out of bundling
 those changes together in one transaction.

Good example.

 Now, maybe we're never going to fix those kinds of anomalies anyway,
 but if we go with this architecture, then I think the chances of it
 ever being palatable to try are pretty low.

Why?

  But it's not quite the
  same as the xmin of the row itself, because some updates might be
  judged not to matter. ?There could also be intermediate cases where
  updates are invalidating for some purposes but not others. ?I think
  we'd better get our hands around more of the problem space before we
  start trying to engineer solutions.
 
  I'm not seeing that problem. ?Any operation that would update some xmin
  horizon should set it to the greater of its current value and the value the
  operation needs for its own correctness. ?If you have something in mind that
  needs more, could you elaborate?

Simon's point about xmin vs. xid probably leads to an example.  One value is
fine for TRUNCATE, because only the most recent TRUNCATE matters.  Not all DDL
is so simple.

 Well, consider something like CLUSTER.  It's perfectly OK for CLUSTER
 to operate on a table that has been truncated since CLUSTER's snapshot
 was taken, and no serialization anomaly is created that would not have
 already existed as a result of the non-MVCC-safe TRUNCATE.  On the
 other hand, if CLUSTER operates on a table that was created since
 CLUSTER's snapshot was taken, then you have a bona fide serialization
 anomaly.

Core CLUSTER does not use any MVCC snapshot.  We do push one for the benefit
of functions called during the reindex phase, but it does not appear that you
speak of that snapshot.  Could you elaborate this example?

 Maybe not a very important one, but does that prove that
 there's no significant problem of this type in general, or just
 nobody's thought through all the cases yet?  After all, the issues
 with CREATE TABLE/TRUNCATE vs. a concurrent SELECT have been around
 for a very long time, and we're only just getting around to looking at
 them, so I don't have much confidence that there aren't other cases
 floating around out there.

Granted.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG FETCH readahead

2012-03-06 Thread Noah Misch

On Tue, Mar 06, 2012 at 07:07:41AM +0100, Boszormenyi Zoltan wrote:
 2012-03-05 19:56 keltez?ssel, Noah Misch ?rta:
  Or how about a new feature in the backend, so ECPG can do
  UPDATE/DELETE ... WHERE OFFSET N OF cursor
  and the offset of computed from the actual cursor position and the 
  position known
  by the application? This way an app can do readahead and do work on rows 
  collected
  by the cursor with WHERE CURRENT OF which gets converted to WHERE OFFSET OF
  behind the scenes.
  That's a neat idea, but I would expect obstacles threatening our ability to
  use it automatically for readahead.  You would have to make the cursor a
  SCROLL cursor.  We'll often pass a negative offset, making the operation 
  fail
  if the cursor query used FOR UPDATE.  Volatile functions in the query will 
  get
  more calls.  That's assuming the operation will map internally to something
  like MOVE N; UPDATE ... WHERE CURRENT OF; MOVE -N.  You might come up with
  innovations to mitigate those obstacles, but those innovations would 
  probably
  also apply to MOVE/FETCH.  In any event, this would constitute a substantive
  patch in its own right.
 
 I was thinking along the lines of a Portal keeping the ItemPointerData
 for each tuple in the last FETCH statement. The WHERE OFFSET N OF cursor
 would treat the offset value relative to the tuple order returned by FETCH.
 So, OFFSET 0 OF == CURRENT OF and other values of N are negative.
 This way, it doesn't matter if the cursor is SCROLL, NO SCROLL or have
  the default behaviour with SCROLL in some cases. Then ECPGopen()
 doesn't have to play games with the DECLARE statement. Only ECPGfetch()
 needs to play with MOVE statements, passing different offsets to the backend,
 not what the application passed.

That broad approach sounds promising.  The main other consideration that comes
to mind is a plan to limit resource usage for a cursor that reads, say, 1B
rows.  However, I think attempting to implement this now will significantly
decrease the chance of getting the core patch features committed now.

  One way out of trouble here is to make WHERE CURRENT OF imply READHEAD
  1/READHEAD 0 (incidentally, perhaps those two should be synonyms) on the
  affected cursor.  If the cursor has some other readahead quantity declared
  explicitly, throw an error during preprocessing.
 
 I played with this idea a while ago, from a different point of view.
 If the ECPG code had the DECLARE mycur, DML ... WHERE CURRENT OF mycur
 and OPEN mycur in exactly this order, i.e. WHERE CURRENT OF appears in
 a standalone function between DECLARE and the first OPEN for the cursor,
 then ECPG disabled readahead automatically for that cursor and for that
 cursor only. But this requires effort on the user of ECPG and can be very
 fragile. Code cleanup with reordering functions can break previously
 working code.

Don't the same challenges apply to accurately reporting an error when the user
specifies WHERE CURRENT OF for a readahead cursor?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Remove extra copies of LogwrtResult.

2012-03-06 Thread Fujii Masao

On Tue, Mar 6, 2012 at 5:18 PM, Heikki Linnakangas
heikki.linnakan...@iki.fi wrote:
 Remove extra copies of LogwrtResult.

I found one typo. Attached patch fixes that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6818,6824  StartupXLOG(void)
  	else
  	{
  		/*
! 		 * Whenever Write.LogwrtResult points to exactly the end of a page,
  		 * Write.curridx must point to the *next* page (see XLogWrite()).
  		 *
  		 * Note: it might seem we should do AdvanceXLInsertBuffer() here, but
--- 6818,6824 
  	else
  	{
  		/*
! 		 * Whenever LogwrtResult points to exactly the end of a page,
  		 * Write.curridx must point to the *next* page (see XLogWrite()).
  		 *
  		 * Note: it might seem we should do AdvanceXLInsertBuffer() here, but

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-06 Thread Shigeru Hanada

(2012/03/06 6:19), Tom Lane wrote:
 I've committed the PlanForeignScan API change, with that change and
 some other minor editorialization.  The pgsql_fdw patch now needs an
 update, so I set it back to Waiting On Author state.

Thanks.

I've revised pgsql_fdw to catch up to this change, but I'll post those
patches after fixing the bug reported by Albe Laurenz.

BTW, what I did for this change is also needed for other existing FDWs
to make them available on 9.2.  So, I'd like to add how to change FDWs
for 9.2 to SQL/MED wiki page, where probably most of the FDW authors check.

Regards,
-- 
Shigeru Hanada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-06 Thread Fujii Masao

On Tue, Mar 6, 2012 at 2:17 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 21.02.2012 13:19, Fujii Masao wrote:
 In some places, the spinlock insertpos_lck is taken while another
 spinlock info_lck is being held. Is this OK? What if unfortunately
 inner spinlock takes long to be taken?

 Hmm, that's only done at a checkpoint (and a restartpoint), so I doubt
 that's a big issue in practice. We had the same pattern before the
 patch, just with WALInsertLock instead of insertpos_lck. Holding a
 spinlock longer is much worse than holding a lwlock longer, but
 nevertheless I don't think that's a problem.

 No, that's NOT okay.  A spinlock is only supposed to be held across a
 short straight-line sequence of instructions.

This also strikes me that the usage of the spinlock insertpos_lck might
not be OK in ReserveXLogInsertLocation() because a few dozen instructions
can be performed while holding the spinlock

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: CHECK FUNCTION statement


Excerpts from Pavel Stehule's message of mar mar 06 03:43:06 -0300 2012:
 Hello
 
 * I refreshed regress tests and appended tests for multi lines query
 * There are enhanced checking of SELECT INTO statement
 * I fixed showing details and hints

Oh, I forgot to remove the do_tup_output_slot() function which I added
in some previous version but is no longer necessary.  Also, there are
two includes that I put separately in functioncmds.c that are only
necessary for the CHECK FUNCTION stuff; those should be merged in with
the other includes there.  (I was toying with the idea of moving all
that code to a new file).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Remove extra copies of LogwrtResult.

On Tue, Mar 6, 2012 at 7:06 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Mar 6, 2012 at 5:18 PM, Heikki Linnakangas
 heikki.linnakan...@iki.fi wrote:
 Remove extra copies of LogwrtResult.

 I found one typo. Attached patch fixes that.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Pg 9.2 extension install

2012-03-06 Thread Misa Simic

Hi,

I have made some pg extension for Pg 9.1 and want to deploy it on Pg
9.2dev...

When i try create extension it shows error version mismatch server is 9.2
library has made for 9.1...

How to make library for 9.2?

Thanks,

Misa

Sent from my Windows Phone

Re: [HACKERS] RFC: Making TRUNCATE more MVCC-safe

On Tue, Mar 6, 2012 at 5:43 AM, Noah Misch n...@leadboat.com wrote:
 Now, maybe we're never going to fix those kinds of anomalies anyway,
 but if we go with this architecture, then I think the chances of it
 ever being palatable to try are pretty low.

 Why?

Because it'll require at least one XID column in every system catalog
that represents an SQL catalog to plug all the cases, and I doubt very
much that we want to go there.

 Simon's point about xmin vs. xid probably leads to an example.  One value is
 fine for TRUNCATE, because only the most recent TRUNCATE matters.  Not all DDL
 is so simple.

Yep.

 Well, consider something like CLUSTER.  It's perfectly OK for CLUSTER
 to operate on a table that has been truncated since CLUSTER's snapshot
 was taken, and no serialization anomaly is created that would not have
 already existed as a result of the non-MVCC-safe TRUNCATE.  On the
 other hand, if CLUSTER operates on a table that was created since
 CLUSTER's snapshot was taken, then you have a bona fide serialization
 anomaly.

 Core CLUSTER does not use any MVCC snapshot.  We do push one for the benefit
 of functions called during the reindex phase, but it does not appear that you
 speak of that snapshot.  Could you elaborate this example?

Imagine this:

- Transaction #1 acquires a snapshot.
- Transaction #2 creates tables A, inserts a row into table B, and then commits.
- Transaction #1 tries to CLUSTER A and then select from B.

The only serial execution schedules are T1  T2, in which case the
transaction fails, or T2  T1, in which case the row is seen.  But
what actually happens is that the row isn't seen and yet the
transaction doesn't fail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: CHECK FUNCTION statement

2012/3/6 Alvaro Herrera alvhe...@commandprompt.com:

 Excerpts from Pavel Stehule's message of mar mar 06 10:44:09 -0300 2012:

 2012/3/6 Alvaro Herrera alvhe...@commandprompt.com:
 
  Excerpts from Pavel Stehule's message of mar mar 06 03:43:06 -0300 2012:
  Hello
 
  * I refreshed regress tests and appended tests for multi lines query
  * There are enhanced checking of SELECT INTO statement
  * I fixed showing details and hints
 
  Oh, I forgot to remove the do_tup_output_slot() function which I added
  in some previous version but is no longer necessary.  Also, there are
  two includes that I put separately in functioncmds.c that are only
  necessary for the CHECK FUNCTION stuff; those should be merged in with
  the other includes there.  (I was toying with the idea of moving all
  that code to a new file).
 

 I am not sure, what you did do. Can you remove this useless code, please?

 It's just a three line function that's not called anywhere.

ok

fixed patch

Pavel


 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support


check_function-2012-03-06-2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-06 Thread Kohei KaiGai

Hi, Yeb.

Thanks for your reviewing and patch updates.
(and sorry my delayed response...)

I'd like to point out a case when plabel-label is NULL.

In case of sepgsql_setcon() being invoked with null argument
to reset security label of the client, but not committed yet,
the last item of the client_label_pending has null label.
(It performs as a mark of a security label being reset.)

It is a case when sepgsql_get_client_label() should return
the client_label_peer, not plabel-label.
So, I reverted some of your replacement; that assumes the
pending label is valid with Assert() check to null value.

Most of comments update are quite helpful for me.
So, I merged your revised one in this patch.

Thanks so much!

2012/3/3 Yeb Havinga yebhavi...@gmail.com:
 On 2012-02-24 17:25, Yeb Havinga wrote:

 On 2012-02-23 12:17, Kohei KaiGai wrote:

 2012/2/20 Yeb Havingayebhavi...@gmail.com:

 On 2012-02-05 10:09, Kohei KaiGai wrote:

 The attached part-1 patch moves related routines from hooks.c to
 label.c
 because of references to static variables. The part-2 patch implements
 above
 mechanism.


 I took a short look at this patch but am stuck getting the regression
 test
 to run properly.

 First, patch 2 misses the file sepgsql.sql.in and therefore the creation
 function command for sepgsql_setcon is missing.

 Thanks for your comments.

 I added the definition of sepgsql_setcon function to sepgsql.sql.in file,
 in addition to patch rebasing.


 Very brief comments due to must leave keyboard soon:

 I read the source code and played a bit with setcon and the debugger, no
 strange things found.

 Code comments / questions:


 I took the liberty to change a few things, mostly comments, in the attached
 patch:


 maybe client_label_committed is a better name for client_label_setcon?


 this change was made.


 Is the double negation in the sentence below intended?


 several comments were changed / moved. There is now one place where te
 behaviour of the different client_label variables are explained.



 sepgsql_set_client_label(), maybe add a comment to !new_label that it is
 reset to the peer label.


 done.


 Is the assert client_label_peer != NULL in sepgsql_get_client_label
 necessary?
 new_label == NULL / pending_label-label == NULL means use the peer label.
 Why not use the peer label instead?


 It turned out that pending_label-label is invariantly non null. Changed
 code to assume that and added some Asserts.



 set_label: if new_label == current label according to getcon, is it
 necessary to add to the pending list?


 this question still remains. Maybe there would be reasons to hit selinux
 with the question: can I change from A-A.


 sepgsql_subxact_callback(), could this be made easier to read by just
 taking llast(client_label_pending), assert that plabel-subid == mySubId and
 then list_delete on pointer of that listcell?


 no this was a naieve suggestion, which fails in any case of a subtransaction
 with not exactly one call to sepgsql_setcon()


 Some comments contain typos, I can spend some time on this, though I'm not
 a native english speaker so it won't be perfect.


 sgml documentation must still be added. If time permits I can spend some
 time on that tomorrow.


 regards,
 Yeb Havinga


 --
 Yeb Havinga
 http://www.mgrid.net/
 Mastering Medical Data




-- 
KaiGai Kohei kai...@kaigai.gr.jp


pgsql-v9.2-sepgsql-setcon.part-2.v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-06 Thread Fujii Masao

On Tue, Mar 6, 2012 at 1:50 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
+ * An xlog-switch record consumes all the remaining space on the
+ * WAL segment. We have already reserved it for us, but we still
need
+ * to make sure it's been allocated and zeroed in the WAL buffers
so
+ * that when the caller (or someone else) does XLogWrite(), it can
+ * really write out all the zeros.

Why do we need to write out all the remaining space with zeros? In
current master, we don't do that. A recovery code ignores the data
following XLOG_SWITCH record, so I don't think that's required.

It's to keep the logic simpler. Before the patch, an xlog-switch just
initialized the next page in the WAL buffers to insert to, to be the first
page in the next segment. With this patch, we rely on a simple linear
mapping from an XLogRecPtr to the WAL buffer that should contain that page
(see XLogRecPtrToBufIdx()). Such a mapping is not possible if you sometimes
skip over pages in the WAL buffers, so we allocate the buffers for those
empty pages, too. Note that this means that an xlog-switch can blow through
all your WAL buffers.

We could probably optimize that so that you don't need to actually write()
and fsync() all the zeros, perhaps by setting a flag on the WAL buffer to
indicate that it only contains padding for an xlog-switch. However, I don't
see any easy way to avoid blowing the cache.

I'm thinking that xlog-switching happens so seldom, and typically on a
fairly idle system, so we don't need to optimize it much. I guess we should
measure the impact, though..

Online backup which forces an xlog-switch twice might be performed under
a certain amount of load. So, to avoid the performance spike during online
backup, I think that the optimization which skips write() and fsync() of
the padding bytes is helpful.

On 22.02.2012 03:34, Fujii Masao wrote:
When I ran the long-running performance test, I encountered the following
panic error.

PANIC: could not find WAL buffer for 0/FF00

0/FF00 is the xlog file boundary, so the patch seems to handle
the xlog file boundary incorrectly. In the patch, current insertion lsn
is advanced by directly incrementing XLogRecPtr.xrecoff as follows.
But to handle the xlog file boundary correctly, we should use
XLByteAdvance() for that, instead?

Thanks, fixed this, too.

I made the locking a bit more strict in WaitXLogInsertionsToFinish(), so
that it grabs the insertpos_lck to read nextslot. I previously thought that
was not necessary, assuming that reading/writing an int32 is atomic, but I'm
afraid there might be memory-ordering issues where the CurrPos of the most
recent slot has not become visible to other backends yet, while the
advancing of nextslot has.

That particular issue would be very hard to hit in practice, so I don't know
if this could explain the recovery failures that Jeff saw. I got the test
script running (thanks for that Jeff!), but unfortunately have not seen any
failures yet (aside from the issue with crossing xlogid boundary), with
either this or the older version of the patch.

Attached is a new version of the patch.

Thanks for the new patch!

In this new patch, I again was able to reproduce the assertion failure which
I described on the upthread.
http://archives.postgresql.org/message-id/CAHGQGwGRuNJ%3D_ctXwteNkFkdvMDNFYxFdn0D1cd-CqL0OgNCLg%40mail.gmail.com

$ uname -a
Linux hermes 3.0.0-16-generic #28-Ubuntu SMP Fri Jan 27 17:50:54 UTC
2012 i686 i686 i386 GNU/Linux

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Mon, Mar 5, 2012 at 10:03 AM, Simon Riggs si...@2ndquadrant.com wrote:
 1. We don't need them because there will be something better in a
 later release. I don't think anybody disagrees that a better solution
 is possible in the future; doubts have been expressed as to what will
 be required and when that is likely to happen. Opinions differ. We can
 and should do something now unless there is reason not to.

I don't think there's really a serious problem here.  Everyone seems
to agree that checksums are useful, and nobody even seems terribly
unhappy with the fact that this is only a 16-bit checksum.  On the
flip side I think it's quite likely that we will never wish to support
multiple checksum implementations, so for that reason I do believe
that whatever we commit first in this area needs to be something we
can live with for a very long time.  It can have limitations of
course, but we should be comfortable with the on-disk format and basic
design of the feature.

 2. Turning checksums on/off/on/off in rapid succession can cause false
 positive reports of checksum failure if crashes occur and are ignored.
 That may lead to the feature and PostgreSQL being held in disrepute.

This I do think is a problem, although not for precisely the reason
stated here.  In my experience, in data corruption situations, the
first thing customers do is blame PostgreSQL: they don't believe it's
the hardware; they accuse us of having bugs in our code.  Having a
checksum feature would be valuable, because, first, we'd perhaps
detect problems sooner and, second, people understand what checksums
are and that checksum failures really shouldn't happen unless the
hardware is bad.  More generally, one of the purposes of checksums is
to distinguish hardware failure from other possible causes of data
corruption problems.  If there are code paths where checksum failures
can happy despite the hardware being good, I think that the patch will
fail to accomplish its goal of giving us confidence that the hardware
is bad.

 This can be resolved, if desired, by having having a two-stage
 enabling process where we must issue a command that scans every block
 in the database before checksum checking begins. VACUUM is easily
 modified to the task, we just need to agree that is suitable and agree
 syntax.
 A suggestion is VACUUM ENABLE CHECKSUMS; others are possible.

If we're going to extend the syntax of VACUUM, we should use the
extensible-options syntax there - i.e. something like VACUUM
(CHECKSUMS ON).  But overall I think this proposal lacks in detail.
It seems to me that there are four states that a database (or table,
if we made it more fine-grained) could be in:

1. No blocks have checksums.

2. Checksums are being added, but some blocks may not yet have them.
Thus, it's not an error for a block to have no checksum, but if a
block appears to have a checksum, it should be correct.   All blocks
are written with checksums.

3. Checksums are fully enabled.   Checksum failures are still an
error, and apparent lack of a checksum is now an error as well, since
it must indicate something's been corrupted.  All blocks are written
with checksums.

4. Checksums are being removed, but some blocks may still have them.
Thus, it's not an error for a block to have no checksum, but any
still-remaining checksums should be correct (though possibly we ought
not to complain if they aren't, to provide a recovery path for users
who are turning checksums off because they're getting errors they
don't want).  Any block that's written is written without checksums.

I think we need to be clear about how the system transitions between
these states.  In the current patch, AIUI, you can effectively go from
1-2 or 4-2 by setting page_checksums=on and from 2-4 by setting
page_checksums=off, but there's no easy way to ensure that you've
reached state 3 or that you've gotten back to state 1.  Some variant
of VACUUM seems like a good way of doing that, but it doesn't make
sense for example to have page_checksums=off and do VACUUM (CHECKSUMS
ON), or to have page_checksums=on and do VACUUM (CHECKSUMS OFF).  I
guess we could just reject those as error cases, but it would be
nicer, I think, to have an interface with a higher degree of
orthogonality.

There's probably more than one way to do that, but my personal
preference, as previously noted, is to make this a table-level option,
rather than a GUC.  Then, VACUUM (CHECKSUMS ON) can first change the
pg_class entry to indicate that checksums are enabling-in-progress
(i.e. 1-2), then scan the table, adding checksums, and then mark
checksums as fully enabled (i.e. 2-3).  VACUUM (CHECKSUMS OFF) can
proceed in symmetric fashion, marking checksums as
disabling-in-progress (3-4), then scanning the table and getting rid
of them, and then marking them fully disabled (4-1).  If a crash
happens in the middle somewhere, the state of the table can get left
as enabling-in-progress or disabling-in-progress, but a new VACUUM
(CHECKSUMS

Re: [HACKERS] performance-test farm

On Mon, Mar 5, 2012 at 5:20 PM, Tomas Vondra t...@fuzzy.cz wrote:
 The idea is that buildfarm systems that are known to have a) reasonable
 hardware and b) no other concurrent work going on could also do
 performance tests.  The main benefit of this approach is it avoids
 duplicating all of the system management and source code building work
 needed for any sort of thing like this; just leverage the buildfarm
 parts when they solve similar enough problems.  Someone has actually
 done all that already; source code was last sync'd to the build farm
 master at the end of March:  https://github.com/greg2ndQuadrant/client-code

 By far the #1 thing needed to move this forward from where it's stuck at
 now is someone willing to dig into the web application side of this.
 We're collecting useful data.  It needs to now be uploaded to the
 server, saved, and then reports of what happened generated.  Eventually
 graphs of performance results over time will be straighforward to
 generate.  But the whole idea requires someone else (not Andrew, who has
 enough to do) sits down and figures out how to extend the web UI with
 these new elements.

 Hi,

 I'd like to revive this thread. A few days ago we have finally got our
 buildfarm member working (it's called magpie) - it's spending ~2h a
 day chewing on the buildfarm tasks, so we can use the other 22h to do
 some useful work.

 I suppose most of you are busy with 9.2 features, but I'm not so I'd
 like to spend my time on this.

 Now that I had to set up the buildfarm member I'm somehow aware of how
 the buildfarm works. I've checked the PGBuildFarm/server-code and
 greg2ndQuadrant/client-code repositories and while I certainly am not a
 perl whiz, I believe I can tweak it to handle the performance-related
 result too.

 What is the current state of this effort? Is there someone else working
 on that? If not, I propose this (for starters):

  * add a new page Performance results to the menu, with a list of
    members that uploaded the perfomance-results

  * for each member, there will be a list of tests along with a running
    average for each test, last test and indicator if it improved, got
    worse or is the same

  * for each member/test, a history of runs will be displayed, along
    with a simple graph

I suspect that the second of these is not that useful; presumably
there will be many small, irrelevant variations.  The graph is the key
thing.

 I'm not quite sure how to define which members will run the performance
 tests - I see two options:

  * for each member, add a flag run performance tests so that we can
    choose which members are supposed to be safe

  OR

  * run the tests on all members (if enabled in build-farm.conf) and
    then decide which results are relevant based on data describing the
    environment (collected when running the tests)

First option seems better to me, but I don't run any buildfarm critters, so...

 I'm also wondering if

  * using the buildfarm infrastructure the right thing to do, if it can
    provide some 'advanced features' (see below)

  * we should use the current buildfarm members (although maybe not all
    of them)

The main advantage of using the buildfarm, AFAICS, is that it would
make it less work for people to participate in this new thing.

  * it can handle one member running the tests with different settings
    (various shared_buffer/work_mem sizes, num of clients etc.) and
    various hw configurations (for example magpie contains a regular
    SATA drive as well as an SSD - would be nice to run two sets of
    tests, one for the spinner, one for the SSD)

That would be nice.

  * this can handle 'pushing' a list of commits to test (instead of
    just testing the HEAD) so that we can ask the members to run the
    tests for particular commits in the past (I consider this to be
    very handy feature)

That would be nice, too.

I think it's great that you want to put some energy into this.  It's
something we have been talking about for years, but if we're making
any progress at all, it's glacial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER VERBOSE (9.1.3)

On Mon, Mar 5, 2012 at 4:40 PM, Larry Rosenman l...@lerctr.org wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Is there any way to get more info out of CLUSTER VERBOSE so it says
 what index it's working on AFTER the table re-write?

 INFO:  clustering public.values using sequential scan and sort
 INFO:  values: found 0 removable, 260953511 nonremovable row
 versions in 4224437 pages
 DETAIL:  0 dead row versions cannot be removed yet.
 CPU 168.02s/4324.68u sec elapsed 8379.12 sec.


 And at this point it's doing something(tm), I assume re-doing the indexes.

 It would be nice(tm) to get more info.

 Ideas?

Try setting client_min_messages=DEBUG1.  At least on current sources
that gives some additional, relevant output; I think that's probably
there in 9.1.x as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-03-06 Thread Albe Laurenz

Shigeru Hanada wrote:
Connection should be closed only when the trigger is a
 top level transaction and it's aborting, but isTopLevel flag was not
 checked.  I fixed the bug and added regression tests for such cases.

I wondered about that - is it really necessary to close the remote
connection? Wouldn't a ROLLBACK on the remote connection be good enough?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pg 9.2 extension install

2012-03-06 Thread Misa Simic

OK, I have removed all *.o and all *.so files - and extension is
succesfully deployed on 9.2

Thanks,

Misa

2012/3/6 Misa Simic misa.si...@gmail.com

 Hi,

 I have made some pg extension for Pg 9.1 and want to deploy it on Pg
 9.2dev...

 When i try create extension it shows error version mismatch server is 9.2
 library has made for 9.1...

 How to make library for 9.2?

 Thanks,

 Misa

 Sent from my Windows Phone

Re: [HACKERS] elegant and effective way for running jobs inside a database

On Mon, Mar 5, 2012 at 5:03 PM, Artur Litwinowicz ad...@ybka.com wrote:
 Regarding a functional area I can help... but I can not understand why
 this idea is so unappreciated?

I think it's a bit unfair to say that this idea is unappreciated.
There are LOTS of good features that we don't have yet simply because
nobody's had time to implement them.  There are many things I'd really
like to have that I have spent no time at all on as yet, just because
there are other things that I (or my employer) would like to have even
more.  The good news is that this is an open-source project and there
is always room at the table for more people who would like to
contribute (or fund others so that they can contribute).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-06 Thread Heikki Linnakangas


On 06.03.2012 14:52, Fujii Masao wrote:

On Tue, Mar 6, 2012 at 2:17 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

On 21.02.2012 13:19, Fujii Masao wrote:

In some places, the spinlock insertpos_lck is taken while another
spinlock info_lck is being held. Is this OK? What if unfortunately
inner spinlock takes long to be taken?



Hmm, that's only done at a checkpoint (and a restartpoint), so I doubt
that's a big issue in practice. We had the same pattern before the
patch, just with WALInsertLock instead of insertpos_lck. Holding a
spinlock longer is much worse than holding a lwlock longer, but
nevertheless I don't think that's a problem.


No, that's NOT okay.  A spinlock is only supposed to be held across a
short straight-line sequence of instructions.


Ok, that's easy enough to fix.


This also strikes me that the usage of the spinlock insertpos_lck might
not be OK in ReserveXLogInsertLocation() because a few dozen instructions
can be performed while holding the spinlock


I admit that block is longer than any of our existing spinlock blocks. 
However, it's important for performance. I tried using a lwlock earlier, 
and that negated the gains. So if that's a serious objection, then let's 
resolve that now before I spend any more time on other aspects of the 
patch. Any ideas how to make that block shorter?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

On Tue, Mar 6, 2012 at 10:07 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I admit that block is longer than any of our existing spinlock blocks.
 However, it's important for performance. I tried using a lwlock earlier, and
 that negated the gains. So if that's a serious objection, then let's resolve
 that now before I spend any more time on other aspects of the patch. Any
 ideas how to make that block shorter?

We shouldn't put the cart in front of the horse.  The point of keeping
spinlock acquisitions short is to improve performance by preventing
excess spinning.  If the performance is better with a spinlock than
with an lwlock, then clearly the spinning isn't excessive, or at least
not in the case you tested.

That having been said, shorter critical sections are always good, of course...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 06.03.2012 14:52, Fujii Masao wrote:
 This also strikes me that the usage of the spinlock insertpos_lck might
 not be OK in ReserveXLogInsertLocation() because a few dozen instructions
 can be performed while holding the spinlock

 I admit that block is longer than any of our existing spinlock blocks. 
 However, it's important for performance. I tried using a lwlock earlier, 
 and that negated the gains. So if that's a serious objection, then let's 
 resolve that now before I spend any more time on other aspects of the 
 patch. Any ideas how to make that block shorter?

How long is the current locked code exactly --- does it contain a loop?

I'm not sure where the threshold of pain is for length of time holding a
spinlock.  I wouldn't go out of the way to avoid using a spinlock for
say a hundred instructions, at least not unless it was a very
high-contention lock.  But sleeping while holding a spinlock is right out.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dropping PL language retains support functions

On Mon, Mar 05, 2012 at 11:38:33PM -0500, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  By doing a DROP CASCADE on plpython2, you drop the user functions, but
  not the support functions.
 
 Well, yeah.  The language depends on the support functions, not the
 other way around.
 
  This certainly looks like a bug.  Should I work on a patch?
 
 It's not a bug, and it's unlikely you can fix it in pg_upgrade without
 making things worse.
 
 The long-run plan is that the procedural language and its support
 functions are all part of an extension and what you do is drop the
 extension.  We're not quite there yet.  As of 9.1, if you do create
 extension plpython2 to start with, dropping the extension does drop the
 support functions too ... but if you use the legacy create language
 syntax, that doesn't happen, because an extension object isn't created.

Well, if CREATE LANGUAGE created those functions, it seems logical that
DROP FUNCTION removes them.  Why is that not a bug?  I am not saying we
have to fix it, but sure seems like a bug to me.  Are you saying other
objects might rely on those functions?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 5, 2012 at 5:03 PM, Artur Litwinowicz ad...@ybka.com wrote:
 Regarding a functional area I can help... but I can not understand why
 this idea is so unappreciated?

 I think it's a bit unfair to say that this idea is unappreciated.

Well, there is the question of why we should re-invent the cron wheel.

 There are LOTS of good features that we don't have yet simply because
 nobody's had time to implement them.

Implementation work is only part of it.  Any large feature will create
an ongoing, distributed maintenance overhead.  It seems entirely
possible to me that we'd not accept such a feature even if someone
dropped a working implementation on us.

But having said that, it's not apparent to me why such a thing would
need to live inside the database at all.  It's very easy to visualize
a task scheduler that runs as a client and requires nothing new from the
core code.  Approaching the problem that way would let the scheduler
be an independent project that stands or falls on its own merits.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote:

 4. Checksums are being removed, but some blocks may still have them.
 Thus, it's not an error for a block to have no checksum, but any
 still-remaining checksums should be correct (though possibly we ought
 not to complain if they aren't, to provide a recovery path for users
 who are turning checksums off because they're getting errors they
 don't want).  Any block that's written is written without checksums.

I agree its advantageous to have a means of removing pagesums from
data blocks as a 4th state.

I think we need a 5th state - pagesums disabled. Which allows an
emergency disabling of the feature without rewriting the blocks.
Obviously if the database is damaged and I/O devices going bad, trying
to rescan database is likely to cause further problems.

 I think we need to be clear about how the system transitions between
 these states.  In the current patch, AIUI, you can effectively go from
 1-2 or 4-2 by setting page_checksums=on and from 2-4 by setting
 page_checksums=off, but there's no easy way to ensure that you've
 reached state 3 or that you've gotten back to state 1.  Some variant
 of VACUUM seems like a good way of doing that, but it doesn't make
 sense for example to have page_checksums=off and do VACUUM (CHECKSUMS
 ON), or to have page_checksums=on and do VACUUM (CHECKSUMS OFF).  I
 guess we could just reject those as error cases, but it would be
 nicer, I think, to have an interface with a higher degree of
 orthogonality.

Right, a misunderstanding I think.

If we have states set at database level then we'd not have a GUC as well.

 There's probably more than one way to do that, but my personal
 preference, as previously noted, is to make this a table-level option,
 rather than a GUC.  Then, VACUUM (CHECKSUMS ON) can first change the
 pg_class entry to indicate that checksums are enabling-in-progress
 (i.e. 1-2), then scan the table, adding checksums, and then mark
 checksums as fully enabled (i.e. 2-3).  VACUUM (CHECKSUMS OFF) can
 proceed in symmetric fashion, marking checksums as
 disabling-in-progress (3-4), then scanning the table and getting rid
 of them, and then marking them fully disabled (4-1).  If a crash
 happens in the middle somewhere, the state of the table can get left
 as enabling-in-progress or disabling-in-progress, but a new VACUUM
 (CHECKSUMS X) can be used to finish the process, and we always know
 exactly where we're at.

Any in-progress state needs to have checksums removed first, then re-added.

I'll keep an open mind for now about database/table level. I'm not
sure how possible/desirable each is.

 I generally agree with this outline, though I think that in lieu of a
 version number we could simply set a new pd_flags bit indicating that
 checksums are enabled.  If we haven't fully enabled checksums yet,
 then the fact that this bit isn't set is not an error; but if
 checksums are fully enabled, then every page must have that bit set,
 and any page that doesn't is ipso facto corrupt.

Whether to have it or not, if a corruption occurs during
checksum-enabling then we could get a false reading. If we have a bit
then the bit can be set wrong, so we could either make a check when it
wasn't due, or skip a check we should have made. If we don't have a
bit and so skip checksum checking during enabling process then we can
get an error that isn't spotted by the checksum process.

Given it can happen both ways, we should have a bit/ or not depending
upon which is the least likely to be wrong. I would say having the bit
provides the least likely way to get false readings.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [9.2] Confusion over CacheRegisterSyscacheCallback

2012-03-06 Thread Marko Kreen

Commit b5282aa893:
  Revise sinval code to remove no-longer-used tuple TID from inval messages.

dropped ItemPointer from callbacks and replaced it with hashValue.
There seems to be 2 ways that new backend code calculates it:

- hashoid(oid), which seems to assume too much?
- CatalogCacheComputeTupleHashValue(), which is 'static'.

So my question is that after doing generic SearchSysCache() like:

   tup = SearchSysCache(USERMAPPINGUSERSERVER,
ObjectIdGetDatum(user_mapping-userid),
ObjectIdGetDatum(foreign_server-serverid),
0, 0);

what is the proper way to calculate the hashValue that 
will be given to callback?

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012/3/6 Tom Lane t...@sss.pgh.pa.us:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 5, 2012 at 5:03 PM, Artur Litwinowicz ad...@ybka.com wrote:
 Regarding a functional area I can help... but I can not understand why
 this idea is so unappreciated?

 I think it's a bit unfair to say that this idea is unappreciated.

 Well, there is the question of why we should re-invent the cron wheel.

 There are LOTS of good features that we don't have yet simply because
 nobody's had time to implement them.

 Implementation work is only part of it.  Any large feature will create
 an ongoing, distributed maintenance overhead.  It seems entirely
 possible to me that we'd not accept such a feature even if someone
 dropped a working implementation on us.

 But having said that, it's not apparent to me why such a thing would
 need to live inside the database at all.  It's very easy to visualize
 a task scheduler that runs as a client and requires nothing new from the
 core code.  Approaching the problem that way would let the scheduler
 be an independent project that stands or falls on its own merits.

There are a few arguments for scheduler in core

* platform independence
* possible richer workflow and loging possibilities or as minimum -
better integration with SP
* when application has lot of business logic in stored procedures,
then outer scheduler is little bit foreign element - harder
maintaining, harder configuration
* when somebody would to implement some like materialised views, then
have to have use outer schedule for very simple task - just exec SP
every 5 minutes

so I think there are reason why we can have a scheduler on core -
simple or richer, but it can helps. cron and similar works, but
maintaining of external scheduler is more expensive then using some
simple scheduler in core.

Regards

Pavel


                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

On Tue, Mar 6, 2012 at 10:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 5, 2012 at 5:03 PM, Artur Litwinowicz ad...@ybka.com wrote:
 Regarding a functional area I can help... but I can not understand why
 this idea is so unappreciated?

 I think it's a bit unfair to say that this idea is unappreciated.

 Well, there is the question of why we should re-invent the cron wheel.

 There are LOTS of good features that we don't have yet simply because
 nobody's had time to implement them.

 Implementation work is only part of it.  Any large feature will create
 an ongoing, distributed maintenance overhead.  It seems entirely
 possible to me that we'd not accept such a feature even if someone
 dropped a working implementation on us.

 But having said that, it's not apparent to me why such a thing would
 need to live inside the database at all.  It's very easy to visualize
 a task scheduler that runs as a client and requires nothing new from the
 core code.  Approaching the problem that way would let the scheduler
 be an independent project that stands or falls on its own merits.

I was trying to make a general comment about PostgreSQL development,
without diving too far into the merits or demerits of this particular
feature.  I suspect you'd agree with me that, in general, a lot of
valuable things don't get done because there aren't enough people or
enough hours in the day, and we can always use more contributors.

But since you brought it up, I think there is a lot of value to having
a scheduler that's integrated with the database.  There are many
things that the database does which could also be done outside the
database, but people want them in the database because it's easier
that way.  If you have a web application that talks to the database,
and which sometimes needs to schedule tasks to run at a future time,
it is much nicer to do that by inserting a row into an SQL table
somewhere, or executing some bit of DDL, than it is to do it by making
your web application know how to connect to a PostgreSQL database and
also how to rewrite crontab (in a concurrency-safe manner, no less).

Now, the extent to which such a schedule requires core support is
certainly arguable.  Maybe it doesn't, and can be an entirely
stand-alone project.  pgAgent aims to do something like this, but it
has a number of deficiencies, including a tendency to quit
unexpectedly and a very klunky interface.  Those are things that could
presumably fixed, or done differently in a new implementation, and
maybe that's all anyone needs.  Or maybe it's not.  But at any rate I
think the idea of a better job scheduler is a good one, and if anyone
is interested in working on that, I think we should encourage them to
do so, regardless of what happens vis-a-vis core.  This is a very
common need, and the current solutions are clearly more awkward than
our users would like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dropping PL language retains support functions

Bruce Momjian br...@momjian.us writes:
 On Mon, Mar 05, 2012 at 11:38:33PM -0500, Tom Lane wrote:
 Well, yeah.  The language depends on the support functions, not the
 other way around.

 Well, if CREATE LANGUAGE created those functions, it seems logical that
 DROP FUNCTION removes them.  Why is that not a bug?

You can call it a bug all you want, but changing the way those
dependencies work is not a good idea.  As I said, the right path forward
is to work towards putting the PL and its support functions into an
extension, and that change doesn't seem like a bug fix so much as a
fundamental revision of how PLs work.

 Are you saying other
 objects might rely on those functions?

IIRC we have cases where multiple PLs share the same support functions;
at least, the CREATE LANGUAGE code is written with the assumption that
that's okay.  Perhaps we'd have to stop doing that in order to let them
be treated as independent extensions.  It needs some thought.

To my mind this is all bound up with getting rid of pg_pltemplate,
which was last discussed in this thread:
http://archives.postgresql.org/pgsql-hackers/2011-08/msg01045.php
We don't seem to quite have consensus on how to proceed forward.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote:

 For the reasons stated above, I believe pd_tli is less useful than
 pd_pagesize_version.  I fear that if we repurpose pd_pagesize_version,
 we're going to make things very difficult for people who want to write
 block-inspection tools, like pg_filedump or pageinspect.  Right now,
 it's possible to look at that offset within the block and know for
 certain what page version you're dealing with.  If we repurpose it to
 hold checksum data, that will no longer be possible.  Unlike pd_tli,
 pd_pagesize_version is validated every time we read a block.

We've not changed the page format in 5 years. I really can't see what
the value of having a constant stored on every data block, especially
since you're now saying that we *shouldn't* bump the constant for this
change. Surely if we are keeping the pd_pagesize_version field its
obvious that we should increment it? If not, why the insistence on
keeping the field if we aren't using it for its stated purpose?

Do you know of any PostgreSQL variant that can set this byte range to
different values? If so, I'd suggest we just declare the field user
defined or some such so that others can use it for different things
as well and then use pd_tli.

IMHO if we keep use pd_tli but pd_pagesize_version then we should increment it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

Robert Haas robertmh...@gmail.com writes:
 On Tue, Mar 6, 2012 at 10:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 But having said that, it's not apparent to me why such a thing would
 need to live inside the database at all.  It's very easy to visualize
 a task scheduler that runs as a client and requires nothing new from the
 core code.  Approaching the problem that way would let the scheduler
 be an independent project that stands or falls on its own merits.

 But since you brought it up, I think there is a lot of value to having
 a scheduler that's integrated with the database.  There are many
 things that the database does which could also be done outside the
 database, but people want them in the database because it's easier
 that way.  If you have a web application that talks to the database,
 and which sometimes needs to schedule tasks to run at a future time,
 it is much nicer to do that by inserting a row into an SQL table
 somewhere, or executing some bit of DDL, than it is to do it by making
 your web application know how to connect to a PostgreSQL database and
 also how to rewrite crontab (in a concurrency-safe manner, no less).

Sure, and I would expect that a client-side scheduler would work just
the same way: you make requests to it through database actions such
as inserting a row in a task table.

 Now, the extent to which such a schedule requires core support is
 certainly arguable.  Maybe it doesn't, and can be an entirely
 stand-alone project.  pgAgent aims to do something like this, but it
 has a number of deficiencies, including a tendency to quit
 unexpectedly and a very klunky interface.

Well, if they didn't get it right the first time, that suggests that
it's a harder problem than people would like to think.  All the more
reason to do it as an external project, at least to start with.
I would much rather entertain a proposal to integrate a design that's
been proven by an external implementation, than a proposal to implement
a design that's never been tested at all (which we'll nonetheless have
to support for eternity, even if it turns out to suck).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 3:31 PM, Simon Riggs si...@2ndquadrant.com wrote:

 I'll keep an open mind for now about database/table level. I'm not
 sure how possible/desirable each is.

Table level sounds great, but how will it work with recovery? We don't
have a relcache in Startup process.

So either database or tablespace level seems doable.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 10:31 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Mar 6, 2012 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote:
 4. Checksums are being removed, but some blocks may still have them.
 Thus, it's not an error for a block to have no checksum, but any
 still-remaining checksums should be correct (though possibly we ought
 not to complain if they aren't, to provide a recovery path for users
 who are turning checksums off because they're getting errors they
 don't want).  Any block that's written is written without checksums.

 I agree its advantageous to have a means of removing pagesums from
 data blocks as a 4th state.

 I think we need a 5th state - pagesums disabled. Which allows an
 emergency disabling of the feature without rewriting the blocks.
 Obviously if the database is damaged and I/O devices going bad, trying
 to rescan database is likely to cause further problems.

Maybe we should consider making this a separate flag that controls
what we do when we detect a checksum failure -- ERROR, LOG, or just
count it in the stats -- rather than an extra state.

 I think we need to be clear about how the system transitions between
 these states.  In the current patch, AIUI, you can effectively go from
 1-2 or 4-2 by setting page_checksums=on and from 2-4 by setting
 page_checksums=off, but there's no easy way to ensure that you've
 reached state 3 or that you've gotten back to state 1.  Some variant
 of VACUUM seems like a good way of doing that, but it doesn't make
 sense for example to have page_checksums=off and do VACUUM (CHECKSUMS
 ON), or to have page_checksums=on and do VACUUM (CHECKSUMS OFF).  I
 guess we could just reject those as error cases, but it would be
 nicer, I think, to have an interface with a higher degree of
 orthogonality.

 Right, a misunderstanding I think.

 If we have states set at database level then we'd not have a GUC as well.

OK.  If the flag is set at database level rather than table or cluster
level, then shared catalogs become a bit of a wart.  Maybe managably
so, but it needs thought, at the least.

 There's probably more than one way to do that, but my personal
 preference, as previously noted, is to make this a table-level option,
 rather than a GUC.  Then, VACUUM (CHECKSUMS ON) can first change the
 pg_class entry to indicate that checksums are enabling-in-progress
 (i.e. 1-2), then scan the table, adding checksums, and then mark
 checksums as fully enabled (i.e. 2-3).  VACUUM (CHECKSUMS OFF) can
 proceed in symmetric fashion, marking checksums as
 disabling-in-progress (3-4), then scanning the table and getting rid
 of them, and then marking them fully disabled (4-1).  If a crash
 happens in the middle somewhere, the state of the table can get left
 as enabling-in-progress or disabling-in-progress, but a new VACUUM
 (CHECKSUMS X) can be used to finish the process, and we always know
 exactly where we're at.

 Any in-progress state needs to have checksums removed first, then re-added.

I don't think that's necessarily true.  If the user begins disabling
checksums (i.e. they are in state #4) and they change their mind, they
should be able to switch back to enabling mode.

 I'll keep an open mind for now about database/table level. I'm not
 sure how possible/desirable each is.

I think that per-table is clearly more desirable, but I haven't
studied how invasive the code changes are.  I suspect it is manageable
but I might be building castles in the air.

 I generally agree with this outline, though I think that in lieu of a
 version number we could simply set a new pd_flags bit indicating that
 checksums are enabled.  If we haven't fully enabled checksums yet,
 then the fact that this bit isn't set is not an error; but if
 checksums are fully enabled, then every page must have that bit set,
 and any page that doesn't is ipso facto corrupt.

 Whether to have it or not, if a corruption occurs during
 checksum-enabling then we could get a false reading.

That is true, and I think unavoidable.  While checksums are being
enabled, we have to assume that a not-checksummed page just hasn't
been updated yet.  There's no way to distinguish those cases.

 If we have a bit
 then the bit can be set wrong, so we could either make a check when it
 wasn't due, or skip a check we should have made. If we don't have a
 bit and so skip checksum checking during enabling process then we can
 get an error that isn't spotted by the checksum process.

 Given it can happen both ways, we should have a bit/ or not depending
 upon which is the least likely to be wrong. I would say having the bit
 provides the least likely way to get false readings.

I agree that having a bit is good.  It's not really necessary, but it
makes it easier to understand the logic, and allows us to catch and
distinguish error cases better.  Once checksums are fully enabled, all
checksums must match, bit or no bit.  But it's easier to understand
what the intermediate states look like with

Re: [HACKERS] [9.2] Confusion over CacheRegisterSyscacheCallback

Marko Kreen mark...@gmail.com writes:
 So my question is that after doing generic SearchSysCache() like:

tup = SearchSysCache(USERMAPPINGUSERSERVER,
 ObjectIdGetDatum(user_mapping-userid),
 ObjectIdGetDatum(foreign_server-serverid),
 0, 0);

 what is the proper way to calculate the hashValue that 
 will be given to callback?

Why would you need to know that?  The reason the calculation function
is static is that there's no apparent need to expose that information
outside the syscache subsystem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dropping PL language retains support functions

On Tue, Mar 06, 2012 at 10:38:31AM -0500, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Mon, Mar 05, 2012 at 11:38:33PM -0500, Tom Lane wrote:
  Well, yeah.  The language depends on the support functions, not the
  other way around.
 
  Well, if CREATE LANGUAGE created those functions, it seems logical that
  DROP FUNCTION removes them.  Why is that not a bug?
 
 You can call it a bug all you want, but changing the way those
 dependencies work is not a good idea.  As I said, the right path forward
 is to work towards putting the PL and its support functions into an
 extension, and that change doesn't seem like a bug fix so much as a
 fundamental revision of how PLs work.

I understand.

  Are you saying other
  objects might rely on those functions?
 
 IIRC we have cases where multiple PLs share the same support functions;
 at least, the CREATE LANGUAGE code is written with the assumption that
 that's okay.  Perhaps we'd have to stop doing that in order to let them
 be treated as independent extensions.  It needs some thought.

Good point on sharing those functions.

 To my mind this is all bound up with getting rid of pg_pltemplate,
 which was last discussed in this thread:
 http://archives.postgresql.org/pgsql-hackers/2011-08/msg01045.php
 We don't seem to quite have consensus on how to proceed forward.

OK.  At least I understand what is happening, but it certainly surprised
me.  We need to pass on the idea that users have to manually delete
those functions if they are going from plpython2 to plpython3 with
pg_upgrade.  I think users have been doing it in the past by not
switching from plpython2 to plpython3 during the upgrade, but the
one-click installer change makes that necessary.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database


Excerpts from Tom Lane's message of mar mar 06 12:47:46 -0300 2012:
 Robert Haas robertmh...@gmail.com writes:

  But since you brought it up, I think there is a lot of value to having
  a scheduler that's integrated with the database.  There are many
  things that the database does which could also be done outside the
  database, but people want them in the database because it's easier
  that way.  If you have a web application that talks to the database,
  and which sometimes needs to schedule tasks to run at a future time,
  it is much nicer to do that by inserting a row into an SQL table
  somewhere, or executing some bit of DDL, than it is to do it by making
  your web application know how to connect to a PostgreSQL database and
  also how to rewrite crontab (in a concurrency-safe manner, no less).
 
 Sure, and I would expect that a client-side scheduler would work just
 the same way: you make requests to it through database actions such
 as inserting a row in a task table.

What such an external scheduler would need from core is support for
starting up and shutting down along postmaster (as well as restarts at
appropriate times).  Postmaster already has the ability to start and
shut down many processes depending on several different policies; I
think it's mostly a matter of exporting that funcionality in a sane way.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 10:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Mar 6, 2012 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote:
 For the reasons stated above, I believe pd_tli is less useful than
 pd_pagesize_version.  I fear that if we repurpose pd_pagesize_version,
 we're going to make things very difficult for people who want to write
 block-inspection tools, like pg_filedump or pageinspect.  Right now,
 it's possible to look at that offset within the block and know for
 certain what page version you're dealing with.  If we repurpose it to
 hold checksum data, that will no longer be possible.  Unlike pd_tli,
 pd_pagesize_version is validated every time we read a block.

 We've not changed the page format in 5 years. I really can't see what
 the value of having a constant stored on every data block, especially
 since you're now saying that we *shouldn't* bump the constant for this
 change. Surely if we are keeping the pd_pagesize_version field its
 obvious that we should increment it? If not, why the insistence on
 keeping the field if we aren't using it for its stated purpose?

 Do you know of any PostgreSQL variant that can set this byte range to
 different values? If so, I'd suggest we just declare the field user
 defined or some such so that others can use it for different things
 as well and then use pd_tli.

 IMHO if we keep use pd_tli but pd_pagesize_version then we should increment 
 it.

The fact that we haven't changed the page format in 5 years is a good
thing, and I hope that we won't change it very often because it will
require whole-cluster rewrites to take full advantage of whatever
features are made available by the version bump, which is darn
painful.  However, I'm pretty sure that eventually we're going to want
to bump it.  Aside from checksums, the most imminent thing I can think
of that might cause us to do that is this idea regarding XID
wraparound:

http://archives.postgresql.org/message-id/4f2fa541.8040...@enterprisedb.com

However, even if we don't do that or we find some way to do it without
bumping the page version, it seems likely to me that something else
will come up eventually.  The size of our page header doesn't thrill
me, but the one byte we've allocated to storing the version is only a
minor contributor and pretty cheap insurance against future needs.

As to whether we should increment pd_pagesize_version, I'm not sure
quite what you were saying about that (I think you may have an extra
or missing word there), but I don't think it's necessary here.  I
believe we feel free to assign new flag bits without bumping the page
size version, so we could define PD_HAS_CHECKSUM without doing so.
Maybe your point is that we're changing the meaning of pd_tli and it
seems ugly to do that without the bumping the page version, but I
guess my point is that we're not changing it incompatibly.  We really
only need to bump the page version for changes where a newer version
of the server would otherwise misinterpret an older page, which isn't
a problem in this case because pd_tli is basically dead already.

And, on a more practical level, Tom argued on the other thread that if
we have a page upgrade facility, then we ought to store the minimum
page version for every relation in a pg_class column, so we can keep
track of when all pages of the older format are gone.  That's
infrastructure that this patch doesn't really need, and we can avoid
having to build it by steering clear of the page versioning issue
altogether, viewing this instead as an enhancement of the existing
page format that doesn't break compatibility with older releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 4:06 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Mar 6, 2012 at 3:31 PM, Simon Riggs si...@2ndquadrant.com wrote:

 I'll keep an open mind for now about database/table level. I'm not
 sure how possible/desirable each is.

 Table level sounds great, but how will it work with recovery? We don't
 have a relcache in Startup process.

 So either database or tablespace level seems doable.

Even db or ts level is problematic.

Options

(1) Recovery ignores checksums until db in consistent state

(2) Recovery ignores checksums until all databases are enabled, when
we set flag in pg_control

(3) Recovery checks blocks marked as having a checksum, no matter the
overall state

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote:

 As to whether we should increment pd_pagesize_version, I'm not sure
 quite what you were saying about that (I think you may have an extra
 or missing word there), but I don't think it's necessary here.

I said this...

On Tue, Mar 6, 2012 at 3:40 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Do you know of any PostgreSQL variant that can set this byte range to
 different values?

Not sure what the missing word is there, so I'll ask again.

Has EDB or anybody else you know of has used the pd_pagesize_version
field for something else, so you'd rather I didn't touch that?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 12:00 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Mar 6, 2012 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote:

 As to whether we should increment pd_pagesize_version, I'm not sure
 quite what you were saying about that (I think you may have an extra
 or missing word there), but I don't think it's necessary here.

 I said this...

 On Tue, Mar 6, 2012 at 3:40 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Do you know of any PostgreSQL variant that can set this byte range to
 different values?

 Not sure what the missing word is there, so I'll ask again.

 Has EDB or anybody else you know of has used the pd_pagesize_version
 field for something else, so you'd rather I didn't touch that?

To my knowledge, EDB has not ever stamped pages with a different
pd_pagesize_version, or reused that field for any other purpose, but
I'm not sure about Greenplum or others.  I have seen at least one
pg_filedump output where every page was stamped with the same very odd
value there; I don't remember what the value was, and I don't know
where the output I was sent came from, but it was enough to make me
wonder if somebody's done that.

That's not why I want to leave that field alone, though: I want to
leave that field alone for backward and forward compatibility, so that
any version of community PostgreSQL ever released - and any page
inspection tools, current or future - can look at the low-order byte
of that field and know what page version they've got.  If we didn't
have some other bytes in the page header that seem relatively useless
(like pd_tli and the high bits of pd_pagesize_version), I'd be arguing
for extending the page header rather than clobbering the version
number.  It just seems to me that the page version number is
absolutely the most basic piece of information on the page, which you
must have before you can interpret the rest of the page contents.  For
it to have any value in identifying past page format changes, future
page format changes, or format changes by forks of our main code base,
it's got to be present in every release and have the same meaning in
each one.  If we do a release where 2 can mean the checksum is 2
rather than the page was written by PostgreSQL 8.0 then we've
forever lost the ability to decide, without the use of heuristics,
what kind of page we've got.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

2012-03-06 Thread Heikki Linnakangas


On 06.03.2012 19:00, Simon Riggs wrote:

On Tue, Mar 6, 2012 at 4:42 PM, Robert Haasrobertmh...@gmail.com  wrote:


As to whether we should increment pd_pagesize_version, I'm not sure
quite what you were saying about that (I think you may have an extra
or missing word there), but I don't think it's necessary here.


I said this...

On Tue, Mar 6, 2012 at 3:40 PM, Simon Riggssi...@2ndquadrant.com  wrote:

Do you know of any PostgreSQL variant that can set this byte range to
different values?


Not sure what the missing word is there, so I'll ask again.

Has EDB or anybody else you know of has used the pd_pagesize_version
field for something else, so you'd rather I didn't touch that?


The EDB page format is exactly the same as the community one. Thanks for 
asking.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 11:50 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Mar 6, 2012 at 4:06 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Mar 6, 2012 at 3:31 PM, Simon Riggs si...@2ndquadrant.com wrote:

 I'll keep an open mind for now about database/table level. I'm not
 sure how possible/desirable each is.

 Table level sounds great, but how will it work with recovery? We don't
 have a relcache in Startup process.

 So either database or tablespace level seems doable.

 Even db or ts level is problematic.

 Options

 (1) Recovery ignores checksums until db in consistent state

 (2) Recovery ignores checksums until all databases are enabled, when
 we set flag in pg_control

 (3) Recovery checks blocks marked as having a checksum, no matter the
 overall state

How about combining #1 and #3?  If the database isn't consistent yet
(and thus we can't look at pg_database) then we rely on the blocks
themselves to tell us whether they have checksums.  Once we reach
consistency we can do better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-06 Thread Artur Litwinowicz

With all the respect to all into this Community...
I have no experience enough rich with C or C++ to say yes I can do that
alone. I do not know the internals of PostgreSQL at all. But I have
quite long experience with other languages.
I imagine if you have a piece of code which can run function like
Select function(123); you can reuse it (with some modifications) to
run jobs saved in job manager tables in the same manner. All we need is
two crazy (some simplification) loops - one for job running and one
for control and logging purposes - all fast enought with period not
slower then 5s or faster.

Algorithm for first loop:
check jobs exists and is time to run it
   run job as other sql statements (some validity check may be done)
   get next job
no jobs - delay

second loop:
find started job
   check it is still working
  if error log it, calculate next start time
(may be longer then at the first time)
if configured and clean up
  if works fine log duration
  if just finished log it, calculate next run and clean up
   find next job
no jobs - delay

And it will be art of state if the job could return (but not have to)
next run time value (for log loop to save).
And it is quite all about I wanted to say - do not understand me bad (I
do not want to teach anyone or something like that ;) - I wanted just
explain what I meant.

Best regards,
Artur



0xAF4A859D.asc
Description: application/pgp-keys

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 5:14 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 Has EDB or anybody else you know of has used the pd_pagesize_version
 field for something else, so you'd rather I didn't touch that?


 The EDB page format is exactly the same as the community one. Thanks for
 asking.

No problem, anytime. Just wanted to make sure downstream wasn't affected.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 5:14 PM, Robert Haas robertmh...@gmail.com wrote:

 Options

 (1) Recovery ignores checksums until db in consistent state

 (2) Recovery ignores checksums until all databases are enabled, when
 we set flag in pg_control

 (3) Recovery checks blocks marked as having a checksum, no matter the
 overall state

 How about combining #1 and #3?  If the database isn't consistent yet
 (and thus we can't look at pg_database) then we rely on the blocks
 themselves to tell us whether they have checksums.  Once we reach
 consistency we can do better.

We can change state then, but to what? We don't have a relcache.

Maybe that puts us back at Square #1. Will think

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: URI connection string support for libpq

2012-03-06 Thread Peter Eisentraut

On tis, 2012-03-06 at 10:11 +0200, Alexander Shulgin wrote:
 On 03/06/2012 01:09 AM, Peter Eisentraut wrote:
 
  On ons, 2012-02-22 at 12:26 -0500, Greg Smith wrote:
  I started collecting up all the variants that do work as an
  initial shell script regression test, so that changes don't break
  something that already works.  Here are all the variations that
  already work, setup so that a series of 1 outputs is passing:
 
  Let's please add something like this to the patch.  Otherwise, I foresee
  a lot of potential to break corner cases in the future.
 
 I've included a (separate) test shell script based on Greg's cases in 
 one of the updates.  What would be preferred place to plug it in? 
 Override installcheck in libpq Makefile?

I think that would be the right place.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-06 Thread Christopher Browne

On Tue, Mar 6, 2012 at 12:20 PM, Artur Litwinowicz ad...@ybka.com wrote:
 Algorithm for first loop:
 check jobs exists and is time to run it
   run job as other sql statements (some validity check may be done)
   get next job
 no jobs - delay

There are crucial things missing here, namely the need to establish at
least one database connection in order to be able to check for the
existence of jobs, as well as to establish additional connections as
contexts in which to run jobs.

That implies the need for some New Stuff that isn't quite the same as
what we have within server processes today.

There is nothing horrible about this; just that there's some extra
mechanism that needs to come into existence in order to do this.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 12:23 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Mar 6, 2012 at 5:14 PM, Robert Haas robertmh...@gmail.com wrote:
 Options

 (1) Recovery ignores checksums until db in consistent state

 (2) Recovery ignores checksums until all databases are enabled, when
 we set flag in pg_control

 (3) Recovery checks blocks marked as having a checksum, no matter the
 overall state

 How about combining #1 and #3?  If the database isn't consistent yet
 (and thus we can't look at pg_database) then we rely on the blocks
 themselves to tell us whether they have checksums.  Once we reach
 consistency we can do better.

 We can change state then, but to what? We don't have a relcache.

If the state is per-database or per-tablespace, you can read
pg_database at that point and see what the flag says.  If it's
per-relation, then I agree: you still don't have enough information.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

On Tue, Mar 6, 2012 at 12:37 PM, Christopher Browne cbbro...@gmail.com wrote:
 On Tue, Mar 6, 2012 at 12:20 PM, Artur Litwinowicz ad...@ybka.com wrote:
 Algorithm for first loop:
 check jobs exists and is time to run it
   run job as other sql statements (some validity check may be done)
   get next job
 no jobs - delay

 There are crucial things missing here, namely the need to establish at
 least one database connection in order to be able to check for the
 existence of jobs, as well as to establish additional connections as
 contexts in which to run jobs.

 That implies the need for some New Stuff that isn't quite the same as
 what we have within server processes today.

 There is nothing horrible about this; just that there's some extra
 mechanism that needs to come into existence in order to do this.

And also some interface.  It'd be useful to have background jobs that
executed either immediately or at a certain time or after a certain
delay, as well as repeating jobs that execute at a certain interval or
on a certain schedule.  Figuring out what all that should look like
is, well, part of the work that someone has to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 06, 2012 at 09:25:17AM -0500, Robert Haas wrote:
  2. Turning checksums on/off/on/off in rapid succession can cause false
  positive reports of checksum failure if crashes occur and are ignored.
  That may lead to the feature and PostgreSQL being held in disrepute.
 
 This I do think is a problem, although not for precisely the reason
 stated here.  In my experience, in data corruption situations, the
 first thing customers do is blame PostgreSQL: they don't believe it's
 the hardware; they accuse us of having bugs in our code.  Having a
 checksum feature would be valuable, because, first, we'd perhaps
 detect problems sooner and, second, people understand what checksums
 are and that checksum failures really shouldn't happen unless the
 hardware is bad.  More generally, one of the purposes of checksums is
 to distinguish hardware failure from other possible causes of data
 corruption problems.  If there are code paths where checksum failures
 can happy despite the hardware being good, I think that the patch will
 fail to accomplish its goal of giving us confidence that the hardware
 is bad.

I think the turning checksums on/off/on/off is really a killer
problem, and obviously many of the actions needed to make it safe make
the checksum feature itself less useful.  

One crazy idea would be to have a checksum _version_ number somewhere on
the page and in pg_controldata.  When you turn on checksums, you
increment that value, and all new checksum pages get that checksum
version;  if you turn off checksums, we just don't check them anymore,
but they might get incorrect due to a hint bit write and a crash.  When
you turn on checksums again, you increment the checksum version again,
and only check pages having the _new_ checksum version.

Yes, this does add additional storage requirements for the checksum, but
I don't see another clean option.  If you can spare one byte, that gives
you 255 times to turn on checksums;   after that, you have to
dump/reload to use the checksum feature.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Mon, Mar 05, 2012 at 03:03:18PM +, Simon Riggs wrote:
 To avoid any confusion as to where this proposed feature is now, I'd
 like to summarise my understanding, make proposals and also request
 clear feedback on them.
 
 Checksums have a number of objections to them outstanding.
 
 1. We don't need them because there will be something better in a
 later release. I don't think anybody disagrees that a better solution
 is possible in the future; doubts have been expressed as to what will
 be required and when that is likely to happen. Opinions differ. We can
 and should do something now unless there is reason not to.

Obviously, one reason not to do this now is that we are way past time to
be designing any feature.  As much as I like how it has progressed and
how it handles pg_upgrade issues, I don't think anyone can state that
this feature is ready to go, and considering how far we are into the
last commit-fest, I think we can fairly say this patch has gotten good
review and return it with feedback.  We can keep discussing it (and I
just posted some ideas myself), but I don't think we can any longer
pretend that this is going into Postgres 9.2.

 2. Turning checksums on/off/on/off in rapid succession can cause false
 positive reports of checksum failure if crashes occur and are ignored.
 That may lead to the feature and PostgreSQL being held in disrepute.
 This can be resolved, if desired, by having having a two-stage
 enabling process where we must issue a command that scans every block
 in the database before checksum checking begins. VACUUM is easily
 modified to the task, we just need to agree that is suitable and agree
 syntax.
 A suggestion is VACUUM ENABLE CHECKSUMS; others are possible.

There is no question that if this feature is done badly, it will make
Postgres look bad, and no one wants that.  As nice as this features is,
making Postgres look bad can't justify it as it currently sits.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012/3/6 Robert Haas robertmh...@gmail.com:
 On Tue, Mar 6, 2012 at 12:37 PM, Christopher Browne cbbro...@gmail.com 
 wrote:
 On Tue, Mar 6, 2012 at 12:20 PM, Artur Litwinowicz ad...@ybka.com wrote:
 Algorithm for first loop:
 check jobs exists and is time to run it
   run job as other sql statements (some validity check may be done)
   get next job
 no jobs - delay

 There are crucial things missing here, namely the need to establish at
 least one database connection in order to be able to check for the
 existence of jobs, as well as to establish additional connections as
 contexts in which to run jobs.

 That implies the need for some New Stuff that isn't quite the same as
 what we have within server processes today.

 There is nothing horrible about this; just that there's some extra
 mechanism that needs to come into existence in order to do this.

 And also some interface.  It'd be useful to have background jobs that
 executed either immediately or at a certain time or after a certain
 delay, as well as repeating jobs that execute at a certain interval or
 on a certain schedule.  Figuring out what all that should look like
 is, well, part of the work that someone has to do.

+1

Regards

Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 5:50 PM, Bruce Momjian br...@momjian.us wrote:

 One crazy idea would be to have a checksum _version_ number somewhere on
 the page and in pg_controldata.  When you turn on checksums, you
 increment that value, and all new checksum pages get that checksum
 version;  if you turn off checksums, we just don't check them anymore,
 but they might get incorrect due to a hint bit write and a crash.  When
 you turn on checksums again, you increment the checksum version again,
 and only check pages having the _new_ checksum version.

 Yes, this does add additional storage requirements for the checksum, but
 I don't see another clean option.  If you can spare one byte, that gives
 you 255 times to turn on checksums;   after that, you have to
 dump/reload to use the checksum feature.

I like the idea very much actually. But I'll let you argue the case
for using pd_pagesize_version for that with your esteemed colleagues.

It would be pretty safe to just let it wrap.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

Robert Haas robertmh...@gmail.com writes:
 That's not why I want to leave that field alone, though: I want to
 leave that field alone for backward and forward compatibility, so that
 any version of community PostgreSQL ever released - and any page
 inspection tools, current or future - can look at the low-order byte
 of that field and know what page version they've got.

I've not been following this thread very closely, but FWIW I find the
above argument extremely compelling.  We could get away with relocating
the version identifier in the narrow context of an upgrade from PG 9.x
to 9.y, but the side effects for external tools such as pg_filedump
would be disastrous.

(And yeah, as maintainer for pg_filedump I'm rather biased.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database


Excerpts from Pavel Stehule's message of mar mar 06 14:57:30 -0300 2012:
 2012/3/6 Robert Haas robertmh...@gmail.com:
  On Tue, Mar 6, 2012 at 12:37 PM, Christopher Browne cbbro...@gmail.com 
  wrote:
  On Tue, Mar 6, 2012 at 12:20 PM, Artur Litwinowicz ad...@ybka.com wrote:
  Algorithm for first loop:
  check jobs exists and is time to run it
    run job as other sql statements (some validity check may be done)
    get next job
  no jobs - delay
 
  There are crucial things missing here, namely the need to establish at
  least one database connection in order to be able to check for the
  existence of jobs, as well as to establish additional connections as
  contexts in which to run jobs.
 
  That implies the need for some New Stuff that isn't quite the same as
  what we have within server processes today.
 
  There is nothing horrible about this; just that there's some extra
  mechanism that needs to come into existence in order to do this.
 
  And also some interface.  It'd be useful to have background jobs that
  executed either immediately or at a certain time or after a certain
  delay, as well as repeating jobs that execute at a certain interval or
  on a certain schedule.  Figuring out what all that should look like
  is, well, part of the work that someone has to do.
 
 +1

It seems to me that we could simply have some sort of external daemon
program running the schedule, i.e. starting up other programs or running
queries; that daemon would connect to the database somehow to fetch
tasks to run.  Separately a client program could be provided to program
tasks using a graphical interface, web, or whatever (more than one, if
we want to get fancy); this would also connect to the database and store
tasks to run by the daemon.  The client doesn't have to talk to the
daemon directly (we can simply have a trigger on the schedule table so
that the daemon receives a notify whenever the client changes stuff).

It seems to me that the only thing that needs core support is the
ability to start up the daemon when postmaster is ready to accept
queries, and shut the daemon down when postmaster kills backends (either
because one crashed, or because it's shutting down).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-06 Thread Merlin Moncure

On Tue, Mar 6, 2012 at 9:37 AM, Robert Haas robertmh...@gmail.com wrote:
 But having said that, it's not apparent to me why such a thing would
 need to live inside the database at all.  It's very easy to visualize
 a task scheduler that runs as a client and requires nothing new from the
 core code.  Approaching the problem that way would let the scheduler
 be an independent project that stands or falls on its own merits.

 I was trying to make a general comment about PostgreSQL development,
 without diving too far into the merits or demerits of this particular
 feature.  I suspect you'd agree with me that, in general, a lot of
 valuable things don't get done because there aren't enough people or
 enough hours in the day, and we can always use more contributors.

 But since you brought it up, I think there is a lot of value to having
 a scheduler that's integrated with the database.  There are many
 things that the database does which could also be done outside the
 database, but people want them in the database because it's easier
 that way.  If you have a web application that talks to the database,
 and which sometimes needs to schedule tasks to run at a future time,
 it is much nicer to do that by inserting a row into an SQL table
 somewhere, or executing some bit of DDL, than it is to do it by making
 your web application know how to connect to a PostgreSQL database and
 also how to rewrite crontab (in a concurrency-safe manner, no less).

The counter argument to this is that there's nothing keeping you from
layering your own scheduling system on top of cron.  Cron provides the
heartbeat -- everything else you build out with tables implementing a
work queue or whatever else comes to mind.

The counter-counter argument is that cron has a couple of annoying
limitations -- sub minute scheduling is not possible, lousy windows
support, etc.  It's pretty appealing that you would be able to back up
your database and get all your scheduling configuration back up with
it.  Dealing with cron is a headache for database administrators.

Personally I find the C-unixy way of solving this problem inside
postgres not worth chasing -- that really does belong outside and you
really are rewriting cron.  A (mostly) sql driven scheduler would be
pretty neat though.

I agree with Chris B upthread: I find that what people really need
here is stored procedures, or some way of being able to embed code in
the database that can manage it's own transactions.  That way your
server-side entry, dostuff() called every minute doesn't have to exit
to avoid accumulating locks for everything it needs to do or be broken
up into multiple independent entry points in scripts outside the
database.  Without SP though, you can still do it via 100% sql/plpgsql
using listen/notify and dblink for the AT workaround, and at least one
dedicated task runner.  By 'it' I mean a server side scheduling system
relying on a heartbeat from out of the database code.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-06 Thread Christopher Browne

On Tue, Mar 6, 2012 at 12:47 PM, Robert Haas robertmh...@gmail.com wrote:
 And also some interface.  It'd be useful to have background jobs that
 executed either immediately or at a certain time or after a certain
 delay, as well as repeating jobs that execute at a certain interval or
 on a certain schedule.  Figuring out what all that should look like
 is, well, part of the work that someone has to do.

Certainly.  It would seem to make sense to have a database schema
indicating this kind of metadata.

It needs to be sophisticated enough to cover *enough* unusual cases.

A schema duplicating crontab might look something like:
create table cron (
  id serial primary key,
  minutes integer[],
  hours text integer[],
  dayofmonth integer[],
  month integer[],
  dayofweek integer[],
  command text
);

That's probably a bit too minimalist, and that only properly supports
one user's crontab.

The schema needs to include things like:
a) When to perform actions.  Several bases for this, including
time-based, event-based.
b) What actions to perform (including context as to database user,
search_path, desired UNIX $PWD, perhaps more than that)
c) Sequencing information, including what jobs should NOT be run concurrently.
d) Logging.  If a job succeeds, that should be noted.  If it fails,
that should be noted.  Want to know start + end times.
e) What to do on failure.  Everything blows up is not a good answer :-).
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-06 Thread Jaime Casanova

On Tue, Mar 6, 2012 at 1:14 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 It seems to me that the only thing that needs core support is the
 ability to start up the daemon when postmaster is ready to accept
 queries, and shut the daemon down when postmaster kills backends (either
 because one crashed, or because it's shutting down).


+1

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

Peter Eisentraut pete...@gmx.net writes:
 On tor, 2012-03-01 at 20:56 +0900, Shigeru Hanada wrote:
 How about moving postgresql_fdw_validator into dblink,

 That's probably a good move.  If this were C++, we might try to subclass
 this whole thing a bit, to avoid code duplication, but I don't see an
 easy way to do that here.

 with renaming to dblink_fdw_validator? 

 Well, it's not the validator of the dblink_fdw, so maybe something like
 basic_postgresql_fdw_validator.

I don't understand this objection.  If we move it into dblink, then it
*is* dblink's validator, and nobody else's.

A bigger issue with postgresql_fdw_validator is that it supposes that
the core backend is authoritative as to what options libpq supports,
which is bad design on its face.  It would be much more sensible for
dblink to be asking libpq what options libpq supports, say via
PQconndefaults().

We might find that we have to leave postgresql_fdw_validator as-is
for backwards compatibility reasons (in particular, being able to load
existing FDW definitions) but I think we should migrate away from using
it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 06, 2012 at 06:00:13PM +, Simon Riggs wrote:
 On Tue, Mar 6, 2012 at 5:50 PM, Bruce Momjian br...@momjian.us wrote:
 
  One crazy idea would be to have a checksum _version_ number somewhere on
  the page and in pg_controldata.  When you turn on checksums, you
  increment that value, and all new checksum pages get that checksum
  version;  if you turn off checksums, we just don't check them anymore,
  but they might get incorrect due to a hint bit write and a crash.  When
  you turn on checksums again, you increment the checksum version again,
  and only check pages having the _new_ checksum version.
 
  Yes, this does add additional storage requirements for the checksum, but
  I don't see another clean option.  If you can spare one byte, that gives
  you 255 times to turn on checksums;   after that, you have to
  dump/reload to use the checksum feature.
 
 I like the idea very much actually. But I'll let you argue the case
 for using pd_pagesize_version for that with your esteemed colleagues.
 
 It would be pretty safe to just let it wrap.

How would we know there are not old unwritten pages sitting around?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch for a locale-specific bug in regression tests (REL9_1_STABLE)

2012-03-06 Thread Tomas Vondra

Hi,

I've noticed a locale-specific bug in regression tests, I discovered
thanks to the new magpie buildfarm member (testing cs_CZ locale).
The problem is in foreign_data where the output is sorted by a column,
and cs_CZ behaves differently from C and en_US.

More precisely, in C it's true that ('s4'  'sc') but that's not true
in cs_CZ (and supposedly some other locales).

I've fixed this by replacing 'sc' with 't0' which seems to fix the
ordering (and should work with other locales too). See the patch attached.

kind regards
Tomas
diff --git a/src/test/regress/expected/foreign_data.out 
b/src/test/regress/expected/foreign_data.out
index 2b3eddf..3f6bce2 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -631,25 +631,25 @@ DROP SERVER s7;
 
 -- CREATE FOREIGN TABLE
 CREATE SCHEMA foreign_schema;
-CREATE SERVER sc FOREIGN DATA WRAPPER dummy;
+CREATE SERVER t0 FOREIGN DATA WRAPPER dummy;
 CREATE FOREIGN TABLE ft1 ();-- ERROR
 ERROR:  syntax error at or near ;
 LINE 1: CREATE FOREIGN TABLE ft1 ();
^
 CREATE FOREIGN TABLE ft1 () SERVER no_server;   -- ERROR
 ERROR:  server no_server does not exist
-CREATE FOREIGN TABLE ft1 (c1 serial) SERVER sc; -- ERROR
+CREATE FOREIGN TABLE ft1 (c1 serial) SERVER t0; -- ERROR
 NOTICE:  CREATE FOREIGN TABLE will create implicit sequence ft1_c1_seq for 
serial column ft1.c1
 ERROR:  default values on foreign tables are not supported
-CREATE FOREIGN TABLE ft1 () SERVER sc WITH OIDS;-- ERROR
+CREATE FOREIGN TABLE ft1 () SERVER t0 WITH OIDS;-- ERROR
 ERROR:  syntax error at or near WITH OIDS
-LINE 1: CREATE FOREIGN TABLE ft1 () SERVER sc WITH OIDS;
+LINE 1: CREATE FOREIGN TABLE ft1 () SERVER t0 WITH OIDS;
   ^
 CREATE FOREIGN TABLE ft1 (
c1 integer NOT NULL,
c2 text,
c3 date
-) SERVER sc OPTIONS (delimiter ',', quote '');
+) SERVER t0 OPTIONS (delimiter ',', quote '');
 COMMENT ON FOREIGN TABLE ft1 IS 'ft1';
 COMMENT ON COLUMN ft1.c1 IS 'ft1.c1';
 \d+ ft1
@@ -659,14 +659,14 @@ COMMENT ON COLUMN ft1.c1 IS 'ft1.c1';
  c1 | integer | not null  | plain| ft1.c1
  c2 | text|   | extended | 
  c3 | date|   | plain| 
-Server: sc
+Server: t0
 Has OIDs: no
 
 \det+
 List of foreign tables
  Schema | Table | Server |  Options   
 +---++
- public | ft1   | sc | {delimiter=,,quote=\}
+ public | ft1   | t0 | {delimiter=,,quote=\}
 (1 row)
 
 CREATE INDEX id_ft1_c2 ON ft1 (c2); -- ERROR
@@ -737,7 +737,7 @@ Foreign table foreign_schema.foreign_table_1
  c7   | integer | 
  c8   | text| 
  c10  | integer | 
-Server: sc
+Server: t0
 
 -- Information schema
 SELECT * FROM information_schema.foreign_data_wrappers ORDER BY 1, 2;
@@ -761,7 +761,7 @@ SELECT * FROM information_schema.foreign_servers ORDER BY 
1, 2;
  regression | s5  | regression   | 
foo   | | 15.0   | 
regress_test_role
  regression | s6  | regression   | 
foo   | | 16.0   | 
regress_test_indirect
  regression | s8  | regression   | 
postgresql| || 
foreign_data_user
- regression | sc  | regression   | 
dummy | || 
foreign_data_user
+ regression | t0  | regression   | 
dummy | || 
foreign_data_user
  regression | t1  | regression   | 
foo   | || 
regress_test_indirect
  regression | t2  | regression   | 
foo   | || 
regress_test_role
 (7 rows)
@@ -823,7 +823,7 @@ SELECT * FROM information_schema.role_usage_grants WHERE 
object_type LIKE 'FOREI
 SELECT * FROM information_schema.foreign_tables ORDER BY 1, 2, 3;
  foreign_table_catalog | foreign_table_schema | foreign_table_name | 
foreign_server_catalog | foreign_server_name 
 
---+--+++-
- regression| foreign_schema   | foreign_table_1| 
regression | sc
+ regression| foreign_schema   | foreign_table_1| 
regression | t0
 (1 row)

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-06 Thread Josh Berkus


 It seems to me that the only thing that needs core support is the
 ability to start up the daemon when postmaster is ready to accept
 queries, and shut the daemon down when postmaster kills backends (either
 because one crashed, or because it's shutting down).

I think this could be addressed simply by the ability to call actions at
a predefined interval, i.e.:

CREATE RECURRING JOB {job_name}
FOR EACH {interval}
[ STARTING {timestamptz} ]
[ ENDING {timestamptz} ]
EXECUTE PROCEDURE {procedure name}

CREATE RECURRING JOB {job_name}
FOR EACH {interval}
[ STARTING {timestamptz} ]
[ ENDING {timestamptz} ]
EXECUTE STATEMENT 'some statement'

(obviously, we'd want to adjust the above to use existing reserved
words, but you get the idea)

Activity and discretion beyond that could be defined in PL code,
including run/don't run conditions, activities, and dependancies.  The
only thing Postgres doesn't currently have is a clock which fires
events.  Anything we try to implement which is more complex than the
above is going to not work for someone.  And the pg_agent could be
adapted easily to use the Postgres clock instead of cron.

Oh, and the ability to run VACUUM inside a larger statement in some way.
 But that's a different TODO.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logging in high performance systems.

Robert Haas robertmh...@gmail.com writes:
 I would dismissed this out of hand at this if you said it a year ago,
 but I'm older and wiser now.  At some point this cycle, I did some
 benchmarking of the subtransaction abort path, since the slowness of
 things like EXCEPTION blocks in PL/pgsql is a known sore point.  I
 don't remember the exact numbers anymore, but I do remember the
 general picture, which is that constructing the error message is
 shockingly expensive compared to anything else that we do in that
 path.  I dropped it at that point for lack of good ideas: it would be
 awfully nice to postpone the error message construction until we know
 that it's actually needed, but I don't see any clean (or even messy)
 way of doing that.

I came across this thread while looking back to see if there was
anything relevant to Martin Pihlak's logging patch.  It doesn't
seem to be a reason not to commit his patch, but I was a bit struck
by your comment about the cost of generating error messages.  We
already knew that was expensive, which is why elog.c has provisions
to not do it if the message isn't going to be printed.  I am wondering
exactly what case you were looking at in the above testing --- was it
plpgsql exception blocks?  If so, I wonder if there'd be any mileage in
trying to postpone error message processing until we see if the user
actually asks for the text of the message.  The SQLERRM variable would
have to become even more magic than it is now for that to work, but
since it's a wart already, maybe that's not a big objection.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 5:56 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Mar 05, 2012 at 03:03:18PM +, Simon Riggs wrote:
 To avoid any confusion as to where this proposed feature is now, I'd
 like to summarise my understanding, make proposals and also request
 clear feedback on them.

 Checksums have a number of objections to them outstanding.

 1. We don't need them because there will be something better in a
 later release. I don't think anybody disagrees that a better solution
 is possible in the future; doubts have been expressed as to what will
 be required and when that is likely to happen. Opinions differ. We can
 and should do something now unless there is reason not to.

 Obviously, one reason not to do this now is that we are way past time to
 be designing any feature.  As much as I like how it has progressed and
 how it handles pg_upgrade issues, I don't think anyone can state that
 this feature is ready to go, and considering how far we are into the
 last commit-fest, I think we can fairly say this patch has gotten good
 review and return it with feedback.  We can keep discussing it (and I
 just posted some ideas myself), but I don't think we can any longer
 pretend that this is going into Postgres 9.2.

Are you calling time on all patches in this CF, or just this one?

From what you say, this seems to be the reason for your thinking:

On Tue, Mar 6, 2012 at 5:50 PM, Bruce Momjian br...@momjian.us wrote:

 I think the turning checksums on/off/on/off is really a killer
 problem, and obviously many of the actions needed to make it safe make
 the checksum feature itself less useful.

The problem is actually on/off/crash/on in quick succession which is
much less likely.

In the current patch that can be resolved by running a VACUUM to
remove checksums if you want to turn them off completely and safely.
That is easily documented as a procedure for people to follow, like
creating types or setting up replication.

The resolution suggested to that problem is to force a VACUUM to turn
on or turn off. There's very little difference between those two
things other than us forcing the user, which as you point out, being
forced to do that could be worse than the problem we're trying to
solve. From this discussion, ISTM that the force route has
sufficient complexity that we could easily make it less robust and
much more likely to receive criticism.

You can break replication by doing on/off/on as well, for example.

It's simply not a killer issue. Not if you have a reasonable grading
of issues. It's a possibility of a usage annoyance, much less annoying
than having an XML type that doesn't actually allow indexes without
knowledge of C, a hash index that doesn't ever recover or an unlogged
table whose data suddenly disappears after a crash, for example.

ISTM pretty reasonable to document the issue clearly, inform people
how it works and let them choose whether to use it or not. Stopping
features from going out because some people might misuse them isn't a
great plan.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 06, 2012 at 01:52:31PM -0500, Bruce Momjian wrote:
 On Tue, Mar 06, 2012 at 06:00:13PM +, Simon Riggs wrote:
  On Tue, Mar 6, 2012 at 5:50 PM, Bruce Momjian br...@momjian.us wrote:
  
   One crazy idea would be to have a checksum _version_ number somewhere on
   the page and in pg_controldata.  When you turn on checksums, you
   increment that value, and all new checksum pages get that checksum
   version;  if you turn off checksums, we just don't check them anymore,
   but they might get incorrect due to a hint bit write and a crash.  When
   you turn on checksums again, you increment the checksum version again,
   and only check pages having the _new_ checksum version.
  
   Yes, this does add additional storage requirements for the checksum, but
   I don't see another clean option.  If you can spare one byte, that gives
   you 255 times to turn on checksums;   after that, you have to
   dump/reload to use the checksum feature.
  
  I like the idea very much actually. But I'll let you argue the case
  for using pd_pagesize_version for that with your esteemed colleagues.
  
  It would be pretty safe to just let it wrap.
 
 How would we know there are not old unwritten pages sitting around?

Perhaps a full xid wrap-around would allow us to re-use checksum
counters.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 06, 2012 at 07:09:23PM +, Simon Riggs wrote:
 On Tue, Mar 6, 2012 at 5:56 PM, Bruce Momjian br...@momjian.us wrote:
  On Mon, Mar 05, 2012 at 03:03:18PM +, Simon Riggs wrote:
  To avoid any confusion as to where this proposed feature is now, I'd
  like to summarise my understanding, make proposals and also request
  clear feedback on them.
 
  Checksums have a number of objections to them outstanding.
 
  1. We don't need them because there will be something better in a
  later release. I don't think anybody disagrees that a better solution
  is possible in the future; doubts have been expressed as to what will
  be required and when that is likely to happen. Opinions differ. We can
  and should do something now unless there is reason not to.
 
  Obviously, one reason not to do this now is that we are way past time to
  be designing any feature.  As much as I like how it has progressed and
  how it handles pg_upgrade issues, I don't think anyone can state that
  this feature is ready to go, and considering how far we are into the
  last commit-fest, I think we can fairly say this patch has gotten good
  review and return it with feedback.  We can keep discussing it (and I
  just posted some ideas myself), but I don't think we can any longer
  pretend that this is going into Postgres 9.2.
 
 Are you calling time on all patches in this CF, or just this one?
 
 From what you say, this seems to be the reason for your thinking:

I am calling time on any CF patch that requires redesign.

  I think the turning checksums on/off/on/off is really a killer
  problem, and obviously many of the actions needed to make it safe make
  the checksum feature itself less useful.
 
 The problem is actually on/off/crash/on in quick succession which is
 much less likely.

True.

 In the current patch that can be resolved by running a VACUUM to
 remove checksums if you want to turn them off completely and safely.
 That is easily documented as a procedure for people to follow, like
 creating types or setting up replication.
 
 The resolution suggested to that problem is to force a VACUUM to turn
 on or turn off. There's very little difference between those two
 things other than us forcing the user, which as you point out, being
 forced to do that could be worse than the problem we're trying to
 solve. From this discussion, ISTM that the force route has
 sufficient complexity that we could easily make it less robust and
 much more likely to receive criticism.
 
 You can break replication by doing on/off/on as well, for example.
 
 It's simply not a killer issue. Not if you have a reasonable grading
 of issues. It's a possibility of a usage annoyance, much less annoying
 than having an XML type that doesn't actually allow indexes without
 knowledge of C, a hash index that doesn't ever recover or an unlogged
 table whose data suddenly disappears after a crash, for example.
 
 ISTM pretty reasonable to document the issue clearly, inform people
 how it works and let them choose whether to use it or not. Stopping
 features from going out because some people might misuse them isn't a
 great plan.

The feature is no where near complete, and we should not be designing
features at this stage.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-03-06 Thread Heikki Linnakangas


On 06.03.2012 17:12, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

On 06.03.2012 14:52, Fujii Masao wrote:

This also strikes me that the usage of the spinlock insertpos_lck might
not be OK in ReserveXLogInsertLocation() because a few dozen instructions
can be performed while holding the spinlock



I admit that block is longer than any of our existing spinlock blocks.
However, it's important for performance. I tried using a lwlock earlier,
and that negated the gains. So if that's a serious objection, then let's
resolve that now before I spend any more time on other aspects of the
patch. Any ideas how to make that block shorter?


How long is the current locked code exactly --- does it contain a loop?


Perhaps best if you take a look for yourself, the function is called 
ReserveXLogInsertLocation() in patch. It calls a helper function called 
 AdvanceXLogRecPtrToNextPage(ptr), which is small and could be inlined. 
It does contain one loop, which iterates once for every WAL page the 
record crosses.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.2] Confusion over CacheRegisterSyscacheCallback

2012-03-06 Thread Marko Kreen

On Tue, Mar 06, 2012 at 11:10:38AM -0500, Tom Lane wrote:
 Marko Kreen mark...@gmail.com writes:
  So my question is that after doing generic SearchSysCache() like:
 
 tup = SearchSysCache(USERMAPPINGUSERSERVER,
  ObjectIdGetDatum(user_mapping-userid),
  ObjectIdGetDatum(foreign_server-serverid),
  0, 0);
 
  what is the proper way to calculate the hashValue that 
  will be given to callback?
 
 Why would you need to know that?  The reason the calculation function
 is static is that there's no apparent need to expose that information
 outside the syscache subsystem.

Because I need to invalidate my own internal state that corresponds
to particular system catalog row?

In current case (plproxy) I need to invalidate libpq connections
that are created from particular foreign server entry.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Simon Riggs's message of lun mar 05 16:34:10 -0300 2012:

 It does however, illustrate my next review comment which is that the
 comments and README items are sorely lacking here. It's quite hard to
 see how it works, let along comment on major design decisions. It
 would help myself and others immensely if we could improve that.

Here's a first attempt at a README illustrating this.  I intend this to
be placed in src/backend/access/heap/README.tuplock; the first three
paragraphs are stolen from the comment in heap_lock_tuple, so I'd remove
those from there, directing people to this new file instead.  Is there
something that you think should be covered more extensively (or at all)
here?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Locking tuples
--

Because the shared-memory lock table is of finite size, but users could
reasonably want to lock large numbers of tuples, we do not rely on the
standard lock manager to store tuple-level locks over the long term.  Instead,
a tuple is marked as locked by setting the current transaction's XID as its
XMAX, and setting additional infomask bits to distinguish this usage from the
more normal case of having deleted the tuple.  When multiple transactions
concurrently lock a tuple, a MultiXact is used; see below.

When it is necessary to wait for a tuple-level lock to be released, the basic
delay is provided by XactLockTableWait or MultiXactIdWait on the contents of
the tuple's XMAX.  However, that mechanism will release all waiters
concurrently, so there would be a race condition as to which waiter gets the
tuple, potentially leading to indefinite starvation of some waiters.  The
possibility of share-locking makes the problem much worse --- a steady stream
of share-lockers can easily block an exclusive locker forever.  To provide
more reliable semantics about who gets a tuple-level lock first, we use the
standard lock manager.  The protocol for waiting for a tuple-level lock is
really

 LockTuple()
 XactLockTableWait()
 mark tuple as locked by me
 UnlockTuple()

When there are multiple waiters, arbitration of who is to get the lock next
is provided by LockTuple().  However, at most one tuple-level lock will
be held or awaited per backend at any time, so we don't risk overflow
of the lock table.  Note that incoming share-lockers are required to
do LockTuple as well, if there is any conflict, to ensure that they don't
starve out waiting exclusive-lockers.  However, if there is not any active
conflict for a tuple, we don't incur any extra overhead.

We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
super-exclusive locking (used to delete tuples and more generally to update
tuples modifying the values of the columns that make up the key of the tuple);
SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
UPDATE.  This last mode implements a mode just strong enough to implement RI
checks, i.e. it ensures that tuples do not go away from under a check, without
blocking when some other transaction that want to update the tuple without
changing its key.

The conflict table is:

KEY UPDATEUPDATESHAREKEY SHARE
KEY UPDATE   conflictconflict  conflict  conflict
UPDATE   conflictconflict  conflict
SHAREconflictconflict
KEY SHAREconflict

When there is a single locker in a tuple, we can just store the locking info
in the tuple itself.  We do this by storing the locker's Xid in XMAX, and
setting hint bits specifying the locking strength.  There is one exception
here: since hint bit space is limited, we do not provide a separate hint bit
for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in
that case.  (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are
presumably more commonly used due to being the standards-mandated locking
mechanism, or heavily used by the RI code, so we want to provide fast paths
for those.)

MultiXacts
--

A tuple header provides very limited space for storing information about tuple
locking and updates: there is room only for a single Xid and a small number of
hint bits.  Whenever we need to store more than one lock, we replace the first
locker's Xid with a new MultiXactId.  Each MultiXact provides extended locking
data; it comprises an array of Xids plus some flags bits for each one.  The
flags are currently used to store the locking strength of each member
transaction.  (The flags also distinguish a pure locker from an actual
updater.)

In earlier PostgreSQL releases, a MultiXact always meant that the tuple was
locked in shared mode by multiple

Re: [HACKERS] foreign key locks, 2nd attempt

On Mon, Mar 5, 2012 at 8:35 PM, Simon Riggs si...@2ndquadrant.com wrote:

 * Why do we need multixact to be persistent? Do we need every page of
 multixact to be persistent, or just particular pages in certain
 circumstances?

 Any page that contains at least one multi with an update as a member
 must persist.  It's possible that some pages contain no update (and this
 is even likely in some workloads, if updates are rare), but I'm not sure
 it's worth complicating the code to cater for early removal of some
 pages.

 If the multixact contains an xid and that is being persisted then you
 need to set an LSN to ensure that a page writes causes an XLogFlush()
 before the multixact write. And you need to set do_fsync, no? Or
 explain why not in comments...

 I was really thinking we could skip the fsync of a page if we've not
 persisted anything important on that page, since that was one of
 Robert's performance points.

We need to increase these values to 32 as well

 #define NUM_MXACTOFFSET_BUFFERS8
 #define NUM_MXACTMEMBER_BUFFERS16

using same logic as for clog.

We're using 25% more space and we already know clog benefits from
increasing them, so there's little doubt we need it here also, since
we are increasing the access rate and potentially the longevity.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

On Tue, Mar 6, 2012 at 12:50 PM, Bruce Momjian br...@momjian.us wrote:
 I think the turning checksums on/off/on/off is really a killer
 problem, and obviously many of the actions needed to make it safe make
 the checksum feature itself less useful.

 One crazy idea would be to have a checksum _version_ number somewhere on
 the page and in pg_controldata.  When you turn on checksums, you
 increment that value, and all new checksum pages get that checksum
 version;  if you turn off checksums, we just don't check them anymore,
 but they might get incorrect due to a hint bit write and a crash.  When
 you turn on checksums again, you increment the checksum version again,
 and only check pages having the _new_ checksum version.

 Yes, this does add additional storage requirements for the checksum, but
 I don't see another clean option.  If you can spare one byte, that gives
 you 255 times to turn on checksums;   after that, you have to
 dump/reload to use the checksum feature.

I don't see what problem that solves.  It's just taking the problem we
already have and a new failure mode (out of checksum versions) on top
of it.  If you see a page with checksum version 153 and the current
checksum version is 152, then you know that either (a) it is the
result of a previous iteration of turning checksums on or (b) the
checksum version number got corrupted.  This is the exact same problem
we have with using a PD_HAS_CHECKSUM bit.  If the bit is not set, then
you know that either (a) it hasn't been checksummed yet or (b) the bit
got corrupted.   In either case, a single poorly placed bit-flip gives
rise to the exact same confusion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Tue, Mar 6, 2012 at 7:39 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
 super-exclusive locking (used to delete tuples and more generally to update
 tuples modifying the values of the columns that make up the key of the tuple);
 SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
 implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
 that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
 UPDATE.  This last mode implements a mode just strong enough to implement RI
 checks, i.e. it ensures that tuples do not go away from under a check, without
 blocking when some other transaction that want to update the tuple without
 changing its key.

So there are 4 lock types, but we only have room for 3 on the tuple
header, so we store the least common/deprecated of the 4 types as a
multixactid. Some rewording would help there.

Neat scheme!


My understanding is that all of theses workloads will change

* Users of explicit SHARE lockers will be slightly worse in the case
of the 1st locker, but then after that they'll be the same as before.

* Updates against an RI locked table will be dramatically faster
because of reduced lock waits


...and that these previous workloads are effectively unchanged:

* Stream of RI checks causes mxacts

* Multi row deadlocks still possible

* Queues of writers still wait in the same way

* Deletes don't cause mxacts unless by same transaction



 In earlier PostgreSQL releases, a MultiXact always meant that the tuple was
 locked in shared mode by multiple transactions.  This is no longer the case; a
 MultiXact may contain an update or delete Xid.  (Keep in mind that tuple locks
 in a transaction do not conflict with other tuple locks in the same
 transaction, so it's possible to have otherwise conflicting locks in a
 MultiXact if they belong to the same transaction).

Somewhat confusing, but am getting there.

 Note that each lock is attributed to the subtransaction that acquires it.
 This means that a subtransaction that aborts is seen as though it releases the
 locks it acquired; concurrent transactions can then proceed without having to
 wait for the main transaction to finish.  It also means that a subtransaction
 can upgrade to a stronger lock level than an earlier transaction had, and if
 the subxact aborts, the earlier, weaker lock is kept.

OK

 The possibility of having an update within a MultiXact means that they must
 persist across crashes and restarts: a future reader of the tuple needs to
 figure out whether the update committed or aborted.  So we have a requirement
 that pg_multixact needs to retain pages of its data until we're certain that
 the MultiXacts in them are no longer of interest.

I think the no longer of interest aspect needs to be tracked more
closely because it will necessarily lead to more I/O.

If we store the LSN on each mxact page, as I think we need to, we can
get rid of pages more quickly if we know they don't have an LSN set.
So its possible we can optimise that more.

 VACUUM is in charge of removing old MultiXacts at the time of tuple freezing.

You mean mxact segments?

Surely we set hint bits on tuples same as now? Hope so.

 This works in the same way that pg_clog segments are removed: we have a
 pg_class column that stores the earliest multixact that could possibly be
 stored in the table; the minimum of all such values is stored in a pg_database
 column.  VACUUM computes the minimum across all pg_database values, and
 removes pg_multixact segments older than the minimum.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dropping PL language retains support functions

2012-03-06 Thread Peter Eisentraut

On mån, 2012-03-05 at 19:37 -0500, Bruce Momjian wrote:
 The exact case is that the user was using plpython2u in PG 9.0, but
 the PG 9.1 one-click installer only supplies plpython3u.

That seems like a pretty stupid choice to me, if it's true.

That doesn't address your issue, but users shouldn't be forced to drop
their languages during an upgrade in the first place.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums, state of play

2012-03-06 Thread Marcin Mańk

On Tue, Mar 06, 2012 at 07:09:23PM +, Simon Riggs wrote:
 The problem is actually on/off/crash/on in quick succession which is
 much less likely.

I must be missing something, but how about:
if (!has_checksums  page_loses_checksum_due_to_hint_bit_write)
 wal_log_the_hint_bit_write();

Greetings
Marcin Mańk

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 06.03.2012 17:12, Tom Lane wrote:
 How long is the current locked code exactly --- does it contain a loop?

 Perhaps best if you take a look for yourself, the function is called 
 ReserveXLogInsertLocation() in patch. It calls a helper function called 
   AdvanceXLogRecPtrToNextPage(ptr), which is small and could be inlined. 
 It does contain one loop, which iterates once for every WAL page the 
 record crosses.

Hm.  The loop makes me a tad uncomfortable, because it is possible for
WAL records to be very long (many pages).  I see the point that
replacing the spinlock with an LWLock would likely negate any
performance win from this patch, but having other processes arrive and
spin while somebody is busy calculating the size of a multi-megabyte
commit record would be bad too.

What I suggest is that it should not be necessary to crawl forward one
page at a time to figure out how many pages will be needed to store N
bytes worth of WAL data.  You're basically implementing a division
problem as repeated subtraction.  Getting the extra WAL-segment-start
overhead right would be slightly tricky; but even if you didn't want to
try to make it pure straight-line code, at the very least it seems like
you could set it up so that the loop iterates only once per segment not
page.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch review for logging hooks (CF 2012-01)

Martin Pihlak martin.pih...@gmail.com writes:
 Updated patch attached.

Applied with minor editorialization (mainly just improving the
comments).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

Preliminary comment:

This README is very helpful.

On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
 super-exclusive locking (used to delete tuples and more generally to update
 tuples modifying the values of the columns that make up the key of the tuple);
 SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
 implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode
 that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
 UPDATE.  This last mode implements a mode just strong enough to implement RI
 checks, i.e. it ensures that tuples do not go away from under a check, without
 blocking when some other transaction that want to update the tuple without
 changing its key.

I feel like there is a naming problem here.  The semantics that have
always been associated with SELECT FOR UPDATE are now attached to
SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
 I think users will be surprised to find that SELECT FOR UPDATE
doesn't block all concurrent updates.

It seems to me that SELECT FOR KEY UPDATE should be called SELECT FOR
UPDATE, and what you're calling SELECT FOR UPDATE should be called
something else - essentially NONKEY UPDATE, though I don't much like
that name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dropping PL language retains support functions

On Tue, Mar 6, 2012 at 3:28 PM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2012-03-05 at 19:37 -0500, Bruce Momjian wrote:
 The exact case is that the user was using plpython2u in PG 9.0, but
 the PG 9.1 one-click installer only supplies plpython3u.

 That seems like a pretty stupid choice to me, if it's true.

 That doesn't address your issue, but users shouldn't be forced to drop
 their languages during an upgrade in the first place.

Hmm.  I had been thinking that it is only possible to support one or
the other, thus the need for a compatibility break at some point would
be forced, but reading the documentation, it seems that it we can ship
both as long as nobody tries to use both at the same time.  I wonder
why we didn't do that.

Dave?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers, patch v11

2012-03-06 Thread Thom Brown

On 6 March 2012 21:04, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 [CASCADE will not run the command triggers for cascaded objects]
 If these are all expected, does it in any way compromise the
 effectiveness of DDL triggers in major use-cases?

 I don't think so.  When replicating the replica will certainly drop the
 same set of dependent objects, for example.  Auditing is another story.
 Do we want to try having cascaded object support in drop commands?

I wasn't sure if auditing was one of the rationale behind the feature
or not.  If it is, it presents a major problem.  How does the replica
know that the objects were dropped?

Thanks for the updated patch and the quick turnaround time.  I'll give
it another review.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to know a table has been modified?

2012-03-06 Thread Dimitri Fontaine

Tatsuo Ishii is...@postgresql.org writes:
 I'm working on implementing query cache in pgpool-II. I want to know
 if a table has been modified because pgpool-II has to invalidate cache
 if corresponding table is modified. For DDL/DML it would be doable
 since pgpool-II knows all SQLs sent from clients. Problem is, implicit
 table modifications done by CASCADE, TRIGGERS and so on.

Some of that (triggers) is provided in the command triggers patch. The
CASCADE not so much but your command trigger will get called on the
top-level object.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.2] Confusion over CacheRegisterSyscacheCallback

Marko Kreen mark...@gmail.com writes:
 On Tue, Mar 06, 2012 at 11:10:38AM -0500, Tom Lane wrote:
 Why would you need to know that?  The reason the calculation function
 is static is that there's no apparent need to expose that information
 outside the syscache subsystem.

 Because I need to invalidate my own internal state that corresponds
 to particular system catalog row?

 In current case (plproxy) I need to invalidate libpq connections
 that are created from particular foreign server entry.

[ shrug... ] You could just flush 'em all, which is what most existing
inval callbacks do.  Admittedly a libpq connection is a bit more
expensive than the average bit of invalidatable state, but how often
does pg_foreign_server get updated?

Or you could do like setrefs.c does, and assume you know how to
calculate the hash value for an OID-keyed cache.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt


Excerpts from Robert Haas's message of mar mar 06 18:10:16 -0300 2012:
 
 Preliminary comment:
 
 This README is very helpful.

Thanks.  I feel silly that I didn't write it earlier.

 On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
  super-exclusive locking (used to delete tuples and more generally to update
  tuples modifying the values of the columns that make up the key of the 
  tuple);
  SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
  implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
  mode
  that does not conflict with exclusive mode, but conflicts with SELECT FOR 
  KEY
  UPDATE.  This last mode implements a mode just strong enough to implement RI
  checks, i.e. it ensures that tuples do not go away from under a check, 
  without
  blocking when some other transaction that want to update the tuple without
  changing its key.
 
 I feel like there is a naming problem here.  The semantics that have
 always been associated with SELECT FOR UPDATE are now attached to
 SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
  I think users will be surprised to find that SELECT FOR UPDATE
 doesn't block all concurrent updates.

I'm not sure why you say that.  Certainly SELECT FOR UPDATE continues to
block all updates.  It continues to block SELECT FOR SHARE as well.
The things that it doesn't block are the new SELECT FOR KEY SHARE locks;
since those didn't exist before, it doesn't seem correct to consider
that SELECT FOR UPDATE changed in any way.

The main difference in the UPDATE behavior is that an UPDATE is regarded
as though it might acquire two different lock modes -- it either
acquires SELECT FOR KEY UPDATE if the key is modified, or SELECT FOR
UPDATE if not.  Since SELECT FOR KEY UPDATE didn't exist before, we can
consider that previous to this patch, what UPDATE did was always acquire
a lock of strength SELECT FOR UPDATE.  So UPDATE also hasn't been
weakened.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logging in high performance systems.

On Tue, Mar 6, 2012 at 2:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I would dismissed this out of hand at this if you said it a year ago,
 but I'm older and wiser now.  At some point this cycle, I did some
 benchmarking of the subtransaction abort path, since the slowness of
 things like EXCEPTION blocks in PL/pgsql is a known sore point.  I
 don't remember the exact numbers anymore, but I do remember the
 general picture, which is that constructing the error message is
 shockingly expensive compared to anything else that we do in that
 path.  I dropped it at that point for lack of good ideas: it would be
 awfully nice to postpone the error message construction until we know
 that it's actually needed, but I don't see any clean (or even messy)
 way of doing that.

 I came across this thread while looking back to see if there was
 anything relevant to Martin Pihlak's logging patch.  It doesn't
 seem to be a reason not to commit his patch, but I was a bit struck
 by your comment about the cost of generating error messages.  We
 already knew that was expensive, which is why elog.c has provisions
 to not do it if the message isn't going to be printed.  I am wondering
 exactly what case you were looking at in the above testing --- was it
 plpgsql exception blocks?  If so, I wonder if there'd be any mileage in
 trying to postpone error message processing until we see if the user
 actually asks for the text of the message.  The SQLERRM variable would
 have to become even more magic than it is now for that to work, but
 since it's a wart already, maybe that's not a big objection.

/me looks to see if he still has the test case.

Yep, I do.  Attached.

A patch I was noodling around with, but ultimately was somewhat
underwhelmed by, also attached.  IIRC it helps, but not a lot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


try_but_fail.sql
Description: Binary data


expand-fmt-string-faster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trigger execution role

2012-03-06 Thread Dimitri Fontaine

Tom Lane t...@sss.pgh.pa.us writes:
 This might be something to consider in the adjacent thread about command
 triggers, too --- who do they run as, and if it's not the calling user,
 how do they find out who that is?

As of now, calling user (we just calling a function), or another user if
the function is SECURITY DEFINER. Also, the current patch makes command
triggers superuser only.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Tue, Mar 6, 2012 at 9:10 PM, Robert Haas robertmh...@gmail.com wrote:
 Preliminary comment:

 This README is very helpful.

 On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
 super-exclusive locking (used to delete tuples and more generally to update
 tuples modifying the values of the columns that make up the key of the 
 tuple);
 SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
 implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
 mode
 that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY
 UPDATE.  This last mode implements a mode just strong enough to implement RI
 checks, i.e. it ensures that tuples do not go away from under a check, 
 without
 blocking when some other transaction that want to update the tuple without
 changing its key.

 I feel like there is a naming problem here.  The semantics that have
 always been associated with SELECT FOR UPDATE are now attached to
 SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
  I think users will be surprised to find that SELECT FOR UPDATE
 doesn't block all concurrent updates.

 It seems to me that SELECT FOR KEY UPDATE should be called SELECT FOR
 UPDATE, and what you're calling SELECT FOR UPDATE should be called
 something else - essentially NONKEY UPDATE, though I don't much like
 that name.

No, because that would stop it from doing what it is designed to do.

The lock modes are correct, appropriate and IMHO have meaningful
names. No redesign required here.

Not sure about the naming of some of the flag bits however.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Initial 9.2 pgbench write results

2012-03-06 Thread Jeff Janes

On Tue, Feb 28, 2012 at 9:49 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Feb 28, 2012 at 11:46 AM, Robert Haas robertmh...@gmail.com wrote:

 This is an interesting hypothesis which I think we can test.  I'm
 thinking of writing a quick patch (just for testing, not for commit)
 to set a new buffer flag BM_BGWRITER_CLEANED to every buffer the
 background writer cleans.   Then we can keep a count of how often such
 buffers are dirtied before they're evicted, vs. how often they're
 evicted before they're dirtied.  If any significant percentage of them
 are redirtied before they're evicted, that would confirm this
 hypothesis.  At any rate I think the numbers would be interesting to
 see.

 Patch attached.

...
 That doesn't look bad at all.  Then I reset the stats, tried it again,
 and got this:

 LOG:  bgwriter_clean: 3863 evict-before-dirty, 198 dirty-before-evict
 LOG:  bgwriter_clean: 3861 evict-before-dirty, 199 dirty-before-evict
 LOG:  bgwriter_clean: 3978 evict-before-dirty, 218 dirty-before-evict
 LOG:  bgwriter_clean: 3928 evict-before-dirty, 204 dirty-before-evict
 LOG:  bgwriter_clean: 3956 evict-before-dirty, 207 dirty-before-evict
 LOG:  bgwriter_clean: 3906 evict-before-dirty, 222 dirty-before-evict
 LOG:  bgwriter_clean: 3912 evict-before-dirty, 197 dirty-before-evict
 LOG:  bgwriter_clean: 3853 evict-before-dirty, 200 dirty-before-evict

 OK, that's not so good, but I don't know why it's different.

I don't think reseting the stats has anything to do with it, it is
just that the shared_buffers warmed up over time.

On my testing, this dirty-before-evict is because the bgwriter is
riding too far ahead of the clock sweep, because of
scan_whole_pool_milliseconds.  Because it is far ahead, that leaves a
lot of run between the two pointers for re-dirtying cache hits to
land.

Not only is 2 minutes likely to be too small of a value for large
shared_buffers, but min_scan_buffers doesn't live up to its name.  It
is not the minimum buffers to scan, it is the minimum to find/make
reusable.  If lots of buffers have a nonzero usagecount (and if your
data doesn't fix in shared_buffers, it is hard to see how more than
half of the buffers can have zero usagecount) or are pinned, you are
scanning a lot more than min_scan_buffers.

If I disable that, then the bgwriter remains just in time, just
slightly ahead of the clock-sweep, and the dirty-before-evict drops a
lot.

If scan_whole_pool_milliseconds is to be used at all, it seems like it
should not be less than checkpoint_timeout.  If I don't want
checkpoints trashing my IO, why would I want someone else to do it
instead?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

On Tue, Mar 6, 2012 at 4:27 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mar mar 06 18:10:16 -0300 2012:

 Preliminary comment:

 This README is very helpful.

 Thanks.  I feel silly that I didn't write it earlier.

 On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
  super-exclusive locking (used to delete tuples and more generally to update
  tuples modifying the values of the columns that make up the key of the 
  tuple);
  SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
  implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
  mode
  that does not conflict with exclusive mode, but conflicts with SELECT FOR 
  KEY
  UPDATE.  This last mode implements a mode just strong enough to implement 
  RI
  checks, i.e. it ensures that tuples do not go away from under a check, 
  without
  blocking when some other transaction that want to update the tuple without
  changing its key.

 I feel like there is a naming problem here.  The semantics that have
 always been associated with SELECT FOR UPDATE are now attached to
 SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened.
  I think users will be surprised to find that SELECT FOR UPDATE
 doesn't block all concurrent updates.

 I'm not sure why you say that.  Certainly SELECT FOR UPDATE continues to
 block all updates.  It continues to block SELECT FOR SHARE as well.
 The things that it doesn't block are the new SELECT FOR KEY SHARE locks;
 since those didn't exist before, it doesn't seem correct to consider
 that SELECT FOR UPDATE changed in any way.

 The main difference in the UPDATE behavior is that an UPDATE is regarded
 as though it might acquire two different lock modes -- it either
 acquires SELECT FOR KEY UPDATE if the key is modified, or SELECT FOR
 UPDATE if not.  Since SELECT FOR KEY UPDATE didn't exist before, we can
 consider that previous to this patch, what UPDATE did was always acquire
 a lock of strength SELECT FOR UPDATE.  So UPDATE also hasn't been
 weakened.

Ah, I see.  My mistake.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-06 Thread Dimitri Fontaine

Josh Berkus j...@agliodbs.com writes:
 Activity and discretion beyond that could be defined in PL code,
 including run/don't run conditions, activities, and dependancies.  The
 only thing Postgres doesn't currently have is a clock which fires
 events.  Anything we try to implement which is more complex than the
 above is going to not work for someone.  And the pg_agent could be
 adapted easily to use the Postgres clock instead of cron.

Oh, you mean like a ticker?  If only we knew about a project that did
implement a ticker, in C, using the PostgreSQL licence, and who's using
it in large scale production.  While at it, if such a ticker could be
used to implement job queues…

  https://github.com/markokr/skytools/tree/master/sql/ticker

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dropping PL language retains support functions

2012-03-06 Thread Peter Eisentraut

On tis, 2012-03-06 at 16:15 -0500, Robert Haas wrote:
 On Tue, Mar 6, 2012 at 3:28 PM, Peter Eisentraut pete...@gmx.net wrote:
  On mån, 2012-03-05 at 19:37 -0500, Bruce Momjian wrote:
  The exact case is that the user was using plpython2u in PG 9.0, but
  the PG 9.1 one-click installer only supplies plpython3u.
 
  That seems like a pretty stupid choice to me, if it's true.
 
  That doesn't address your issue, but users shouldn't be forced to drop
  their languages during an upgrade in the first place.
 
 Hmm.  I had been thinking that it is only possible to support one or
 the other, thus the need for a compatibility break at some point would
 be forced, but reading the documentation, it seems that it we can ship
 both as long as nobody tries to use both at the same time.  I wonder
 why we didn't do that.

Even if only one version were allowed, it would have been way too early
to switch to Python 3.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Initial 9.2 pgbench write results

On Tue, Mar 6, 2012 at 4:35 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I don't think reseting the stats has anything to do with it, it is
 just that the shared_buffers warmed up over time.

Yes.

 On my testing, this dirty-before-evict is because the bgwriter is
 riding too far ahead of the clock sweep, because of
 scan_whole_pool_milliseconds.  Because it is far ahead, that leaves a
 lot of run between the two pointers for re-dirtying cache hits to
 land.

 Not only is 2 minutes likely to be too small of a value for large
 shared_buffers, but min_scan_buffers doesn't live up to its name.  It
 is not the minimum buffers to scan, it is the minimum to find/make
 reusable.  If lots of buffers have a nonzero usagecount (and if your
 data doesn't fix in shared_buffers, it is hard to see how more than
 half of the buffers can have zero usagecount) or are pinned, you are
 scanning a lot more than min_scan_buffers.

 If I disable that, then the bgwriter remains just in time, just
 slightly ahead of the clock-sweep, and the dirty-before-evict drops a
 lot.

 If scan_whole_pool_milliseconds is to be used at all, it seems like it
 should not be less than checkpoint_timeout.  If I don't want
 checkpoints trashing my IO, why would I want someone else to do it
 instead?

I'm not sure that 2 minutes is a bad value (although maybe it is) but
I think you've definitely got a good point as regards
min_scan_buffers.  It seems like the main LRU scan that begins here:

while (num_to_scan  0  reusable_buffers  upcoming_alloc_est)

Ought to be doing something like this:

while (num_to_scan  0  (reusable_buffers 
upcoming_alloc_est || num_already_scanned  min_scan_buffers))

...and the logic that changes upcoming_alloc_est based on
min_scan_buffers ought to be ripped out.  Unless I'm misunderstanding
this logic, this will cause the background writer to scan the buffer
pool considerably FASTER than once every two minutes when there are
lots of high-usage-count buffers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: CHECK FUNCTION statement

Hello

there is new version

* fixed small formatting issues related to drop SPI call
* long functions was divided
* CREATE TRIGGER ALL ON table support

Regards

Pavel


check_function-2012-03-06-3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-06 Thread Merlin Moncure

On Tue, Mar 6, 2012 at 3:44 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Josh Berkus j...@agliodbs.com writes:
 Activity and discretion beyond that could be defined in PL code,
 including run/don't run conditions, activities, and dependancies.  The
 only thing Postgres doesn't currently have is a clock which fires
 events.  Anything we try to implement which is more complex than the
 above is going to not work for someone.  And the pg_agent could be
 adapted easily to use the Postgres clock instead of cron.

 Oh, you mean like a ticker?  If only we knew about a project that did
 implement a ticker, in C, using the PostgreSQL licence, and who's using
 it in large scale production.  While at it, if such a ticker could be
 used to implement job queues…

  https://github.com/markokr/skytools/tree/master/sql/ticker

right -- exactly.  it would be pretty neat if the database exposed
this or a similar feature somehow -- perhaps by having the ticker send
a notify?  then a scheduler could sit on top of it without any
dependencies on the host operating system.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database


Excerpts from Dimitri Fontaine's message of mar mar 06 18:44:18 -0300 2012:
 Josh Berkus j...@agliodbs.com writes:
  Activity and discretion beyond that could be defined in PL code,
  including run/don't run conditions, activities, and dependancies.  The
  only thing Postgres doesn't currently have is a clock which fires
  events.  Anything we try to implement which is more complex than the
  above is going to not work for someone.  And the pg_agent could be
  adapted easily to use the Postgres clock instead of cron.
 
 Oh, you mean like a ticker?  If only we knew about a project that did
 implement a ticker, in C, using the PostgreSQL licence, and who's using
 it in large scale production.  While at it, if such a ticker could be
 used to implement job queues…
 
   https://github.com/markokr/skytools/tree/master/sql/ticker

Why do we need a ticker?  Just fetch the time of the task closest in the
future, and sleep till that time or a notify arrives (meaning schedule
change).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-06 Thread Merlin Moncure

On Tue, Mar 6, 2012 at 4:01 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Why do we need a ticker?  Just fetch the time of the task closest in the
 future, and sleep till that time or a notify arrives (meaning schedule
 change).

Because that can't be done in userland (at least, not without stored
procedures) since you'd have to keep an open running transaction while
sleeping.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elegant and effective way for running jobs inside a database