Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Jesper Krogh

On 2011-03-22 21:43, Robert Haas wrote:

I took a crack at implementing the first approach described above,
which seems to be by far the simplest idea we've come up with to date.
  Patch attached.  It doesn't seem to be that complicated, which could
mean either that it's not that complicated or that I'm missing
something.  Feel free to point and snicker in the latter case.


Looks simple, but there is now benefit on the usage side in the patch,
so it isn't really testable yet? I would love to spend some time testing
when its doable (even with rough corners.)

I'm still a bit puzzled with how it would end up working with a page-level
visibillity map bit for index-scans. There is a clear drop off in 
usabillity

when the change rates of the table goes up, which may or may not be
relevant, but I cannot really judge, since I haven't even got a ballpark
figure about how much table churn would disable say 50% of the usage.

= Really naive suggestion approaching =
Another layout might be to simply drag out t_xmin, t_xmax pr row (8 bytes)
into a table by itself. This table will be way bigger than the one bit 
per page

map, but could be wal-logged as any other change in the system?

It would, by definition make the visibility testing work (way faster 
than today),

no matter how fast the underlying table changes.

State of today (PG 8.4) is that a query like this:
testdb=# select count(id) from testdb.seq where fts @@ to_tsquery('tag');
 count
---
 69753
(1 row)

Time: 5863.600 ms
testdb=# select count(id) from testdb.seq where fts @@ to_tsquery('tag');
 count
---
 69753
(1 row)

Time: 659.832 ms
testdb=# select count(id) from testdb.seq where fts @@ to_tsquery('tag');
 count
---
 69753
(1 row)

Time: 1005.765 ms

Somewhere around 15ns / tuple (not bad at all).
(the first was probably half warm)

The average rows per tuple is somewhere between 4 and 8 for this 
table, assuming
8 and that the 69K are randomly distributed among the 16M other tuples 
(fair assumption
in this case). The 600-1000ms for the fresh cache run are the timing to 
drag:
69753*8192 (page size) = 571MB into memory for visibillity testing 
alone, on warm cache
all pages being in main memory. Packing 16M tuples with 8 bytes / tuple 
in a map would be

around 128MB.

given 8 bytes/row and random distribution of data, that would require us 
to read all 128MB,
so a speedup of x4 on this example, but it would rougly let us count the 
entire table in

the same time.

With regard to disk vs. memory hotness.. those 128MB compares to a table 
size of 32GB
(with a toast table next to it of 64GB) but that shouldn't be touched by 
above query.


The ns/tuple number (today) on a thin table in my system is 
approaching 1ns / tuple.


If the page-level bitmap would be set quite fast on a fairly busy 
system anyway, then
the above is just noise in the air, but I have currently no feeling, and 
there is

some math in there I have trouble setting reliable ballpark numbers on.

There is, by all approaches room for significant improvements for the 
visibillity

testing for a huge range of installations.

Can I drag out numbers of frozenness of tuples from my current systems 
to fill in the

discussion? (how?)

Jesper
--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Gokulakannan Somasundaram


 All operations that clear the bit area are already WAL-logged.

 Is it the case with visibility map also?

Thanks.


Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-23 Thread Markus Wanner
On 03/22/2011 09:33 PM, Robert Haas wrote:
 We might have a version of synchronous replication that works this way
 some day, but it's not the version were shipping with 9.1.  The slave
 acknowledges the WAL records when they hit the disk (i.e. fsync) not
 when they are applied; WAL apply can lag arbitrarily.  The point is to
 guarantee clients that the WAL is on disk somewhere and that it will
 be replayed in the event of a failover.  Despite the fact that this
 doesn't work as you're describing, it's a useful feature in its own
 right.

In that sense, our approach may be more synchronous than most others,
because after the ACK is sent from the slave, the slave still needs to
apply the transaction data from WAL before it gets visible, while the
master needs to wait for the ACK to arrive at its side, before making it
visible there.

Ideally, these two latencies (disk seek and network induced) are just
about equal.  But of course, there's no such guarantee.  So whenever one
of the two is off by an order of magnitude or two (by use case or due to
a temporary overload), either the master or the slave may lag behind the
other machine.

What pleases me is that the guarantee from the slave is somewhat similar
to Postgres-R's: with its ACK, the receiving node doesn't guarantee the
transaction *is* applied locally, it just guarantees that it *will* be
able to do so sometime in the future.  Kind of a mind twister, though...

Regards

Markus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-23 Thread Fujii Masao
On Fri, Mar 18, 2011 at 9:31 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 8:27 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 You could also argue for log a warning, continue until we can open for Hot
 standby, then pause.

 I don't like that one much.

 I can write the patch once we know what we want. All of those options sound
 reasonable to me. This is such a corner-case that it doesn't make sense to
 make it user-configurable, though.

 I agree.  Since pause_at_recovery_target is ignored when
 hot_standby=off, I think it would be consistent to treat the case
 where hot_standby=on but can't actually be initiated the same way -
 just ignore the pause request and enter normal running.

When hot_standby = on and the recovery target is ahead of the consistent point,
the server doesn't enter normal running since FATAL error happens. So I think
that it's more consistent to prevent the server from entering normal
running also
when hot_standby = off.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v19

2011-03-23 Thread Fujii Masao
On Sat, Mar 19, 2011 at 11:28 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 10:25 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 8, 2011 at 7:05 AM, Fujii Masao masao.fu...@gmail.com wrote:
 * Smart shutdown
 Smart shutdown should wait for all the waiting backends to be acked, and
 should not cause them to forcibly exit. But this leads shutdown to get stuck
 infinitely if there is no walsender at that time. To enable them to be acked
 even in that situation, we need to change postmaster so that it accepts the
 replication connection even during smart shutdown (until we reach
 PM_SHUTDOWN_2 state). Postmaster has already accepted the superuser
 connection to cancel backup during smart shutdown. So I don't think that
 the idea to accept the replication connection during smart shutdown is so
 ugly.

 * Fast shutdown
 I agree with you about fast shutdown. Fast shutdown should cause all the
 backends including waiting ones to exit immediately. At that time, the
 non-acked backend should not return the success, according to the
 definition of sync rep. So we need to change a backend so that it gets rid
 of itself from the waiting queue and exits before returning the success,
 when it receives SIGTERM. This change leads the waiting backends to
 do the same even when pg_terminate_backend is called. But since
 they've not been acked yet, it seems to be reasonable to prevent them
 from returning the COMMIT.

 Comments? I'll create the patch barring objection.

 The fast smart shutdown part of this problem has been addressed.  The

 Ugh.  I mean the fast shutdown, of course, not the fast smart
 shutdown.  Anyway, point is:

 fast shutdown now OK
 smart shutdown still not OK
 do you want to write a patch?

 :-)

 smart shutdown case still needs work, and I think the consensus was
 that your proposal above was the best way to go with it.

 Do you still want to work up a patch for this?  If so, I can review.

Sure. Will do.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL

2011-03-23 Thread Marti Raudsepp
On Tue, Mar 22, 2011 at 01:02, Tom Lane t...@sss.pgh.pa.us wrote:
 I studied the code some more, and I think this probably can be made to
 work.  The basic idea is to have preprocess_minmax_aggregates build
 simplified queries like the above (working by modifying the query tree
 that exists at the point where it's called) and call query_planner on
 them.  Save aside the resulting path data, then let the regular planning
 process continue.  When optimize_minmax_aggregates is called, see
 whether the regular plan is cheaper than the sum of the path costs.
 If not, use the paths to construct a replacement plan, same as now.

Thanks a lot! I can confirm that this is fixed now in git version, and
now also works with partitioned tables, which is great news.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication server timeout patch

2011-03-23 Thread Heikki Linnakangas

On 16.03.2011 11:11, Fujii Masao wrote:

On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

Agreed. I'll change the patch.


Done. I attached the updated patch.


I don't much like the API for this. Walsender shouldn't need to know 
about the details of the FE/BE protocol, pq_putbytes_if_available() 
seems too low level to be useful.


I think a better API would be to have a non-blocking version of 
pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, 
so that when the message doesn't fit in the output buffer in 
pq_putmessage(), the buffer is enlarged instead of trying to flush it.


Attached is a patch using that approach. This is a much smaller patch, 
and easier to understand. I'm not totally happy with the walsender main 
loop, it seems to work as it is, but the logic has become quite 
complicated. Ideas welcome on how to simplify that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e0ebee6..3192ef7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2019,6 +2019,28 @@ SET ENABLE_SEQSCAN TO OFF;
/para
   /listitem
  /varlistentry
+
+ varlistentry id=guc-replication-timeout xreflabel=replication_timeout
+  termvarnamereplication_timeout/varname (typeinteger/type)/term
+  indexterm
+   primaryvarnamereplication_timeout/ configuration parameter/primary
+  /indexterm
+  listitem
+   para
+Specifies the maximum time, in milliseconds, to wait for the reply
+from the standby before terminating replication.  This is useful for
+the primary server to detect the standby crash or network outage.
+A value of zero turns this off.  This parameter can only be set in
+the filenamepostgresql.conf/ file or on the server command line.
+The default value is 60 seconds.
+   /para
+   para
+To make the timeout work properly, xref linkend=guc-wal-receiver-status-interval
+must be enabled on the standby, and its value must be less than the
+value of varnamereplication_timeout/.
+   /para
+  /listitem
+ /varlistentry
  /variablelist
 /sect2
 
@@ -2216,6 +2238,11 @@ SET ENABLE_SEQSCAN TO OFF;
the filenamepostgresql.conf/ file or on the server command line.
The default value is 10 seconds.
   /para
+  para
+   When xref linkend=guc-replication-timeout is enabled on the primary,
+   varnamewal_receiver_status_interval/ must be enabled, and its value
+   must be less than the value of varnamereplication_timeout/.
+  /para
   /listitem
  /varlistentry
 
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 3c7b05b..b6dc8cc 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -56,9 +56,11 @@
  *		pq_putbytes		- send bytes to connection (not flushed until pq_flush)
  *		pq_flush		- flush pending output
  *		pq_getbyte_if_available - get a byte if available without blocking
+ *		pq_flush_if_writable	- flush pending output if writable without blocking
  *
  * message-level I/O (and old-style-COPY-OUT cruft):
  *		pq_putmessage	- send a normal message (suppressed in COPY OUT mode)
+ *		pq_putmessage_noblock - buffer a normal message without blocking (suppressed in COPY OUT mode)
  *		pq_startcopyout - inform libpq that a COPY OUT transfer is beginning
  *		pq_endcopyout	- end a COPY OUT transfer
  *
@@ -92,6 +94,7 @@
 #include miscadmin.h
 #include storage/ipc.h
 #include utils/guc.h
+#include utils/memutils.h
 
 /*
  * Configuration options
@@ -108,12 +111,15 @@ static char sock_path[MAXPGPATH];
  * Buffers for low-level I/O
  */
 
-#define PQ_BUFFER_SIZE 8192
+#define PQ_SEND_BUFFER_SIZE 8192
+#define PQ_RECV_BUFFER_SIZE 8192
 
-static char PqSendBuffer[PQ_BUFFER_SIZE];
+static char *PqSendBuffer;
+static int	PqSendBufferSize;
 static int	PqSendPointer;		/* Next index to store a byte in PqSendBuffer */
+static int	PqSendStart;		/* Next index to send a byte in PqSendBuffer */
 
-static char PqRecvBuffer[PQ_BUFFER_SIZE];
+static char PqRecvBuffer[PQ_RECV_BUFFER_SIZE];
 static int	PqRecvPointer;		/* Next index to read a byte from PqRecvBuffer */
 static int	PqRecvLength;		/* End of data available in PqRecvBuffer */
 
@@ -142,7 +148,9 @@ static int	Setup_AF_UNIX(void);
 void
 pq_init(void)
 {
-	PqSendPointer = PqRecvPointer = PqRecvLength = 0;
+	PqSendBufferSize = PQ_SEND_BUFFER_SIZE;
+	PqSendBuffer = MemoryContextAlloc(TopMemoryContext, PqSendBufferSize);
+	PqSendPointer = PqSendStart = PqRecvPointer = PqRecvLength = 0;
 	PqCommBusy = false;
 	DoingCopyOut = false;
 	on_proc_exit(pq_close, 0);
@@ -762,7 +770,7 @@ pq_recvbuf(void)
 		int			r;
 
 		r = secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
-		PQ_BUFFER_SIZE - PqRecvLength);
+		PQ_RECV_BUFFER_SIZE - PqRecvLength);
 
 		if (r  0)
 		{
@@ -1138,10 +1146,10 @@ 

Re: [HACKERS] writing a script to examine dead tuples

2011-03-23 Thread Heikki Linnakangas

On 23.03.2011 01:36, aaronenabs wrote:

Can anyone help me, i am trying to carry out an investigation
which involves accessing dead tuples within the postgresql.

I have been advised i could write a script that allows me to
examine dead tuples and am seeking advise on how to achieve this,
or pointers that can help me going in the right direction. I am
relatively new to this forum and postgresql and have been
assigned postgresql as a topic within my studies which as lead me to
creating a script to perform this task.

If anyone can help me i would be really grateful.


One approach is to use the pageinspect contrib module or pg_filedump to 
view the pages and tuples at a low level. Another is to modify 
PostgreSQL sources to return dead tuples that would normally be skipped.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 4:51 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 9:31 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 8:27 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 You could also argue for log a warning, continue until we can open for Hot
 standby, then pause.

 I don't like that one much.

 I can write the patch once we know what we want. All of those options sound
 reasonable to me. This is such a corner-case that it doesn't make sense to
 make it user-configurable, though.

 I agree.  Since pause_at_recovery_target is ignored when
 hot_standby=off, I think it would be consistent to treat the case
 where hot_standby=on but can't actually be initiated the same way -
 just ignore the pause request and enter normal running.

 When hot_standby = on and the recovery target is ahead of the consistent 
 point,
 the server doesn't enter normal running since FATAL error happens. So I think
 that it's more consistent to prevent the server from entering normal
 running also
 when hot_standby = off.

Actually, my previous email was all nonsense, wasn't it?  If we don't
reach the consistency point, we can't enter normal running anyway -
shut down is the only option no matter what.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 3:27 AM, Markus Wanner mar...@bluegap.ch wrote:
 On 03/22/2011 09:33 PM, Robert Haas wrote:
 We might have a version of synchronous replication that works this way
 some day, but it's not the version were shipping with 9.1.  The slave
 acknowledges the WAL records when they hit the disk (i.e. fsync) not
 when they are applied; WAL apply can lag arbitrarily.  The point is to
 guarantee clients that the WAL is on disk somewhere and that it will
 be replayed in the event of a failover.  Despite the fact that this
 doesn't work as you're describing, it's a useful feature in its own
 right.

 In that sense, our approach may be more synchronous than most others,
 because after the ACK is sent from the slave, the slave still needs to
 apply the transaction data from WAL before it gets visible, while the
 master needs to wait for the ACK to arrive at its side, before making it
 visible there.

 Ideally, these two latencies (disk seek and network induced) are just
 about equal.  But of course, there's no such guarantee.  So whenever one
 of the two is off by an order of magnitude or two (by use case or due to
 a temporary overload), either the master or the slave may lag behind the
 other machine.

 What pleases me is that the guarantee from the slave is somewhat similar
 to Postgres-R's: with its ACK, the receiving node doesn't guarantee the
 transaction *is* applied locally, it just guarantees that it *will* be
 able to do so sometime in the future.  Kind of a mind twister, though...

Yes.  What this won't do is let you build a big load-balancing network
(at least not without great caution about what you assume).  What it
will do is make it really, really hard to lose committed transactions.
 Both good things, but different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl restart - behaviour based on wrong instance

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 1:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Mar 19, 2011 at 10:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 1:19 PM, Erik Rijkers e...@xs4all.nl wrote:
 This is OK and expected.  But then it continues (in the logfile) with:

 FATAL:  lock file postmaster.pid already exists
 HINT:  Is another postmaster (PID 20519) running in data directory
 /var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data?

 So, complaints about the *other* instance.  It doesn't happen once a 
 successful start (with pg_ctl
 start) has happened.

 I'm guessing that leftover postmaster.pid contents might be
 responsible for this?

 The cause is that pg_ctl restart uses the postmaster.opts which was
 created in the primary. Since its content was something like
 pg_ctl -D vanilla_1/data, vanilla_1/data/postmaster.pid was checked
 wrongly.

 The simple workaround is to exclude postmaster.opts from the backup
 as well as postmaster.pid. But when postmaster.opts doesn't exist,
 pg_ctl restart cannot start up the server. We might also need to change
 the code of pg_ctl restart so that it does just pg_ctl start when
 postmaster.opts doesn't exist.

Sounds reasonable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 2:29 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 All operations that clear the bit area are already WAL-logged.

 Is it the case with visibility map also?
 Thanks.

Yes.  Look at the comment that the patch removes.  That describes the
problem being fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 2:16 AM, Jesper Krogh jes...@krogh.cc wrote:
 On 2011-03-22 21:43, Robert Haas wrote:

 I took a crack at implementing the first approach described above,
 which seems to be by far the simplest idea we've come up with to date.
  Patch attached.  It doesn't seem to be that complicated, which could
 mean either that it's not that complicated or that I'm missing
 something.  Feel free to point and snicker in the latter case.

 Looks simple, but there is now benefit on the usage side in the patch,
 so it isn't really testable yet? I would love to spend some time testing
 when its doable (even with rough corners.)

What it probably needs right now is some crash testing - insert a
database panic at various points in the code and then check whether
the state after recovery is still OK.  Also some code review from
people who understand recovery better than me.  *waves to Heikki*

There's a lot more work that will have to be done before this starts
to produce user-visible performance benefits, and then a lot more work
after that before we've exhausted all the possibilities.  I can't cope
with all that right now.  This is basic infrastructure, that will
eventually enable a variety of cool stuff, but isn't particularly sexy
by itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-23 Thread Markus Wanner
On 03/23/2011 12:52 PM, Robert Haas wrote:
 Yes.  What this won't do is let you build a big load-balancing network
 (at least not without great caution about what you assume).

This sounds too strong to me.  Session-aware load balancing is pretty
common these days.  It's the default mode of PgBouncer, for example.
Not much caution required there, IMO.  Or what pitfalls did you have in
mind?

 What it
 will do is make it really, really hard to lose committed transactions.
 Both good things, but different.

..you can still get both at the same time.  At least as long as you are
happy with session-aware load balancing.  And who really needs finer
grained balancing?

(Note that no matter how fine-grained you balance, you are still bound
to a (single core of a) single node.  That changes with distributed
querying, and things really start to get interesting there... but we are
far from that, yet).

Regards

Markus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] copy / paste object - preliminary part

2011-03-23 Thread Vladimir Kokovic
Hi,

I did a preliminary part of the functions for copy / paste object and now
I expect confirmation that the implementation is correct.

Changed files:
frm/frmMain.cpp, nclude/frm/frmMain.h

New files:
frm/frmPasteObject.cpp, include/frm/frmPasteObject.h

Best regards,
Vladimir Kokovic, DP senior, Belgrade, Serbia


pgadmin3.diff
Description: Binary data
//
//
// pgAdmin III - PostgreSQL Tools
//
// Copyright (C) 2002 - 2011, The pgAdmin Development Team
// This software is released under the PostgreSQL Licence
//
// frmPasteObject.cpp - Copy/Paste object functions
//
//

#include wx/wx.h

#include pgAdmin3.h
#include frm/frmPasteObject.h
#include schema/pgSchema.h
#include schema/pgTable.h

frmPasteObject::frmPasteObject(frmMain *form, pgObject *sourceobj, pgObject *targetobj)
{
	this-mainform = form;
	this-sourceobj= sourceobj;
	this-targetobj= targetobj;
}

wxArrayString *getSchemaTables(pgSchema *srcschema)
{
	wxArrayString *objArray = new wxArrayString();

	wxString query = wxT(SELECT relname )
	 wxT(FROM pg_namespace n )
	 wxT(LEFT JOIN pg_class c ON n.oid=c.relnamespace AND relkind='r' )
	 wxT(WHERE nspname=') + srcschema-GetIdentifier() + wxT(');
	query += wxT(ORDER BY relname);

	pgSet *objects = srcschema-GetDatabase()-ExecuteSet(query);

	if (objects)
	{
		while (!objects-Eof())
		{
			if (!objects-GetVal(wxT(relname)).IsNull())
			{
objArray-Add(objects-GetVal(wxT(relname)));
			}
			objects-MoveNext();
		}
		delete objects;
	}

	return objArray;
}

void frmPasteObject::process()
{
	if (!sourceobj || !targetobj)
	{
		return;
	}

	wxArrayString *srcObjArray;
	pgSchema *targetschema = (pgSchema *)targetobj;
	pgSchema *srcschema = 0;
	pgTable *table = (sourceobj-GetMetaType() == PGM_TABLE) ? (pgTable *)sourceobj : 0;
	if (table)
	{
		wxMessageBox(
			wxT(Paste source table\n) +
			table-GetSchema()-GetDatabase()-GetIdentifier() + wxT(.) + table-GetSchema()-GetIdentifier() + wxT(.) + table-GetIdentifier() + wxT(\n) +
			wxT( into schema\n) + targetschema-GetDatabase()-GetIdentifier() + wxT(.) + targetschema-GetIdentifier());
	}
	else
	{
		srcschema = (pgSchema *)sourceobj;
		wxMessageBox(
			wxT(Paste source schema objects\n) +
			srcschema-GetDatabase()-GetIdentifier() + wxT(.) + srcschema-GetIdentifier() + wxT(\n) +
			wxT( into schema\n) +
			targetschema-GetDatabase()-GetIdentifier() + wxT(.) + targetschema-GetIdentifier());
	}

	if (!sourceobj-GetConnection() || !targetobj-GetConnection())
	{
		wxMessageBox(
			_(Both source and target schema connections should be established before paste object operation !));
		return;
	}

	if (srcschema)
	{
		srcObjArray = ::getSchemaTables(srcschema);
	}
	else
	{
		srcObjArray = new wxArrayString();
		srcObjArray-Add(table-GetIdentifier());
		srcschema = table-GetSchema();
	}

	wxString msg;
	for(unsigned int i = 0; i  srcObjArray-Count(); i++)
	{
		msg = wxT(COPY TABLE:) +
			srcschema-GetDatabase()-GetIdentifier() + wxT(.) + srcschema-GetIdentifier() + wxT(.) + srcObjArray-Item(i) +
			wxT( INTO:) + targetschema-GetDatabase()-GetIdentifier() + wxT(.) + targetschema-GetIdentifier();
		mainform-GetStatusBar()-SetStatusText(msg, 1);
		//future implementation
	}
	msg = wxString::Format(wxT(%d TABLE(s) COPIED FROM %s TO %s), srcObjArray-Count(),
		(srcschema-GetDatabase()-GetIdentifier() + wxT(.) + srcschema-GetIdentifier()).c_str(),
		(targetschema-GetDatabase()-GetIdentifier() + wxT(.) + targetschema-GetIdentifier()).c_str());
	mainform-GetStatusBar()-SetStatusText(msg, 1);

	delete srcObjArray;
}

frmPasteObject::~frmPasteObject()
{
}

//
//
// pgAdmin III - PostgreSQL Tools
//
// Copyright (C) 2002 - 2011, The pgAdmin Development Team
// This software is released under the PostgreSQL Licence
//
// frmPasteObject.h - Copy/Paste object functions
//
//

#ifndef FRMPASTEOBJECT_H
#define	FRMPASTEOBJECT_H

#include frm/frmMain.h
#include schema/pgObject.h

class frmPasteObject
{
public:
	frmPasteObject(frmMain *form, pgObject *sourceobj, pgObject *targetobj);
	void process();
	virtual ~frmPasteObject();
private:
	frmMain *mainform;
	pgObject *sourceobj;
	pgObject *targetobj;
};

#endif	/* FRMPASTEOBJECT_H */


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-23 Thread Simon Riggs
On Wed, Mar 23, 2011 at 11:50 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 23, 2011 at 4:51 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 9:31 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 8:27 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 You could also argue for log a warning, continue until we can open for Hot
 standby, then pause.

 I don't like that one much.

 I can write the patch once we know what we want. All of those options sound
 reasonable to me. This is such a corner-case that it doesn't make sense to
 make it user-configurable, though.

 I agree.  Since pause_at_recovery_target is ignored when
 hot_standby=off, I think it would be consistent to treat the case
 where hot_standby=on but can't actually be initiated the same way -
 just ignore the pause request and enter normal running.

 When hot_standby = on and the recovery target is ahead of the consistent 
 point,
 the server doesn't enter normal running since FATAL error happens. So I think
 that it's more consistent to prevent the server from entering normal
 running also
 when hot_standby = off.

 Actually, my previous email was all nonsense, wasn't it?  If we don't
 reach the consistency point, we can't enter normal running anyway -
 shut down is the only option no matter what.

Presumably you mean that the way its currently coded is the way it should stay?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] corner case about replication and shutdown

2011-03-23 Thread Fujii Masao
Hi,

When I read the shutdown code to create the smart shutdown patch for sync rep,
I found the corner case where shutdown can get stuck infinitely. This happens
when postmaster reaches PM_WAIT_BACKENDS state before walsender marks
itself as WAL sender process for streaming WAL (i.e., before walsender calls
MarkPostmasterChildWalSender). In this case,CountChildren(NORMAL) in
PostmasterStateMachine() returns non-zero because normal backend (i.e.,
would-be walsender) is running, and postmaster in PM_WAIT_BACKENDS state
gets out of PostmasterStateMachine(). Then the backend receives
START_REPLICATION command, declares itself as walsender and
CountChildren(NORMAL) returns zero.

The problem is; that declaration doesn't trigger
PostmasterStateMachine() at all.
So, even though there is no normal backends, postmaster cannot call
PostmasterStateMachine() and move its state from PM_WAIT_BACKENDS.

I think this problem is harmless in practice since it doesn't happen
too often. But
that can happen...

The simple fix is to change ServerLoop() so that it periodically calls
PostmasterStateMachine() while shutdown is running. Though I was thinking to
change PostmasterStateMachine(), that looked complicated. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Merlin Moncure
On Wed, Mar 23, 2011 at 1:16 AM, Jesper Krogh jes...@krogh.cc wrote:
 On 2011-03-22 21:43, Robert Haas wrote:

 I took a crack at implementing the first approach described above,
 which seems to be by far the simplest idea we've come up with to date.
  Patch attached.  It doesn't seem to be that complicated, which could
 mean either that it's not that complicated or that I'm missing
 something.  Feel free to point and snicker in the latter case.

 Looks simple, but there is now benefit on the usage side in the patch,
 so it isn't really testable yet? I would love to spend some time testing
 when its doable (even with rough corners.)

 I'm still a bit puzzled with how it would end up working with a page-level
 visibillity map bit for index-scans. There is a clear drop off in
 usabillity
 when the change rates of the table goes up, which may or may not be
 relevant, but I cannot really judge, since I haven't even got a ballpark
 figure about how much table churn would disable say 50% of the usage.

How much benefit you are going to get is going to be really workload
dependent.  In a lot of cases distribution of writes are going to be
really non uniform so that a small percentage of records get the
majority of the writes across the database generally.  Reliable
PD_ALL_VISIBLE opens the door to optimizing around this pattern, which
i'd estimate the vast majority of databases follow in various degrees.

It's really hard to overemphasize how important in performance terms
are the features that mitigate the relative downsides of our mvcc
implementation.  The  HOT feature in 8.3 was an absolute breakthrough
in terms of postgres performance and I expect this will open similar
doors.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 9:38 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Actually, my previous email was all nonsense, wasn't it?  If we don't
 reach the consistency point, we can't enter normal running anyway -
 shut down is the only option no matter what.

 Presumably you mean that the way its currently coded is the way it should 
 stay?

Uh, maybe, but it's not obvious to me that it actually is coded that
way.  I don't see any safeguard that prevents recovery from pausing
before consistency is released.  Is there one?  Where?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 8:16 AM, Markus Wanner mar...@bluegap.ch wrote:
 On 03/23/2011 12:52 PM, Robert Haas wrote:
 Yes.  What this won't do is let you build a big load-balancing network
 (at least not without great caution about what you assume).

 This sounds too strong to me.  Session-aware load balancing is pretty
 common these days.  It's the default mode of PgBouncer, for example.
 Not much caution required there, IMO.  Or what pitfalls did you have in
 mind?

Well, just the one we were talking about: a COMMIT on one node doesn't
guarantee that the transactions is visible on the other node, just
that it will become visible there eventually, even if a crash happens.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Robert Haas
On Fri, Mar 18, 2011 at 8:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 3:52 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I agree to get rid of write_location.

 No, don't remove it.

 We seem to be just looking for things to tweak without any purpose.
 Removing this adds nothing for us.

 We will have the column in the future, it is there now, so leave it.

 Well then can we revert the part of your patch that causes it to not
 actually work any more?

Specifically, if we're not going to remove write location, then I
think we need to apply something like the attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


write-location-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Radosław Smogura
Merlin Moncure mmonc...@gmail.com Monday 21 March 2011 20:58:16
 On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark gsst...@mit.edu wrote:
  On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com 
wrote:
  Can't you make just one large mapping and lock it in 8k regions? I
  thought the problem with mmap was not being able to detect other
  processes
  (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.htm
  l) compatibility issues (possibly obsolete), etc.
  
  I was assuming that locking part of a mapping would force the kernel
  to split the mapping. It has to record the locked state somewhere so
  it needs a data structure that represents the size of the locked
  section and that would, I assume, be the mapping.
  
  It's possible the kernel would not in fact fall over too badly doing
  this. At some point I'll go ahead and do experiments on it. It's a bit
  fraught though as it the performance may depend on the memory
  management features of the chipset.
  
  That said, that's only part of the battle. On 32bit you can't map the
  whole database as your database could easily be larger than your
  address space. I have some ideas on how to tackle that but the
  simplest test would be to just mmap 8kB chunks everywhere.
 
 Even on 64 bit systems you only have 48 bit address space which is not
 a theoretical  limitation.  However, at least on linux you can map in
 and map out pretty quick (10 microseconds paired on my linux vm) so
 that's not so big of a deal.  Dealing with rapidly growing files is a
 problem.  That said, probably you are not going to want to reserve
 multiple gigabytes in 8k non contiguous chunks.
 
  But it's worse than that. Since you're not responsible for flushing
  blocks to disk any longer you need some way to *unlock* a block when
  it's possible to be flushed. That means when you flush the xlog you
  have to somehow find all the blocks that might no longer need to be
  locked and atomically unlock them. That would require new
  infrastructure we don't have though it might not be too hard.
  
  What would be nice is a mlock_until() where you eventually issue a
  call to tell the kernel what point in time you've reached and it
  unlocks everything older than that time.
 
 I wonder if there is any reason to mlock at all...if you are going to
 'do' mmap, can't you just hide under current lock architecture for
 actual locking and do direct memory access without mlock?
 
 merlin
I can't reproduce this. Simple test shows 2x faster read with mmap that 
read();

I'm sending this what I done with mmap (really ugly, but I'm in forest). It is 
read only solution, so init database first with some amount of data (I have 
about 300MB) (2nd level scripts may do this for You).

This what I found:
1. If I not require to put new mmap (mmap with FIXED) in previous region (just 
I do munmap / mmap) with each query, execution time grows, about 10%.

2. Sometimes is enough just to comment or uncomment something that do not have 
side effects on code flow (bufmgr.c; (un)comment some unused if; put NULL, it 
will be replaced), and e.g. query execution time may grow 2x.

3. My initial solution, was 2% faster, about 9ms when reading, now it's 10% 
slower, after making them more usable.

Regards,
Radek


pg_mmap_20110323.patch.bz2
Description: application/bzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Radosław Smogura
Merlin Moncure mmonc...@gmail.com Tuesday 22 March 2011 23:06:02
 On Tue, Mar 22, 2011 at 4:28 PM, Radosław Smogura
 
 rsmog...@softperience.eu wrote:
  Merlin Moncure mmonc...@gmail.com Monday 21 March 2011 20:58:16
  
  On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark gsst...@mit.edu wrote:
   On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com
  
  wrote:
   Can't you make just one large mapping and lock it in 8k regions? I
   thought the problem with mmap was not being able to detect other
   processes
   (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.h
   tm l) compatibility issues (possibly obsolete), etc.
   
   I was assuming that locking part of a mapping would force the kernel
   to split the mapping. It has to record the locked state somewhere so
   it needs a data structure that represents the size of the locked
   section and that would, I assume, be the mapping.
   
   It's possible the kernel would not in fact fall over too badly doing
   this. At some point I'll go ahead and do experiments on it. It's a bit
   fraught though as it the performance may depend on the memory
   management features of the chipset.
   
   That said, that's only part of the battle. On 32bit you can't map the
   whole database as your database could easily be larger than your
   address space. I have some ideas on how to tackle that but the
   simplest test would be to just mmap 8kB chunks everywhere.
  
  Even on 64 bit systems you only have 48 bit address space which is not
  a theoretical  limitation.  However, at least on linux you can map in
  and map out pretty quick (10 microseconds paired on my linux vm) so
  that's not so big of a deal.  Dealing with rapidly growing files is a
  problem.  That said, probably you are not going to want to reserve
  multiple gigabytes in 8k non contiguous chunks.
  
   But it's worse than that. Since you're not responsible for flushing
   blocks to disk any longer you need some way to *unlock* a block when
   it's possible to be flushed. That means when you flush the xlog you
   have to somehow find all the blocks that might no longer need to be
   locked and atomically unlock them. That would require new
   infrastructure we don't have though it might not be too hard.
   
   What would be nice is a mlock_until() where you eventually issue a
   call to tell the kernel what point in time you've reached and it
   unlocks everything older than that time.
  
  I wonder if there is any reason to mlock at all...if you are going to
  'do' mmap, can't you just hide under current lock architecture for
  actual locking and do direct memory access without mlock?
  
  merlin
  
  Actually after dealing with mmap and adding munmap I found crucial thing
  why to not use mmap:
  You need to munmap, and for me this takes much time, even if I read with
  SHARED | PROT_READ, it's looks like Linux do flush or something else,
  same as with MAP_FIXED, MAP_PRIVATE, etc.
 
 can you produce small program demonstrating the problem?  This is not
 how things should work AIUI.
 
 I was thinking about playing with mmap implementation of clog system
 -- it's perhaps better fit.  clog is rigidly defined size, and has
 very high performance requirements.  Also it's much less changes than
 reimplementing heap buffering, and maybe not so much affected by
 munmap.
 
 merlin

Ah... just one thing, maybe usefull why performance is lost with huge memory. 
I saw mmaped buffers are allocated in something like 0x007, so definitly above 
4gb.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-23 Thread Robert Haas
On Fri, Mar 18, 2011 at 10:10 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Mar 7, 2011 at 3:44 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Mar 7, 2011 at 5:27 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Mar 7, 2011 at 7:51 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Efficient transaction-controlled synchronous replication.
 If a standby is broadcasting reply messages and we have named
 one or more standbys in synchronous_standby_names then allow
 users who set synchronous_replication to wait for commit, which
 then provides strict data integrity guarantees. Design avoids
 sending and receiving transaction state information so minimises
 bookkeeping overheads. We synchronize with the highest priority
 standby that is connected and ready to synchronize. Other standbys
 can be defined to takeover in case of standby failure.

 This version has very strict behaviour; more relaxed options
 may be added at a later date.

 Pretty cool! I'd appreciate very much your efforts and contributions.

 And,, I found one bug ;) You seem to have wrongly removed the check
 of max_wal_senders in SyncRepWaitForLSN. This can make the
 backend wait for replication even if max_wal_senders = 0. I could produce
 this problematic situation in my machine. The attached patch fixes this 
 problem.

        if (strlen(SyncRepStandbyNames)  0  max_wal_senders == 0)
                ereport(ERROR,
                                (errmsg(Synchronous replication requires WAL 
 streaming
 (max_wal_senders  0;

 The above check should be required also after pg_ctl reload since
 synchronous_standby_names can be changed by SIGHUP?
 Or how about just removing that? If the patch I submitted is
 committed,empty synchronous_standby_names and max_wal_senders = 0
 settings is no longer unsafe.

 This configuration is now harmless in the sense that it no longer
 horribly breaks the entire system, but it's still pretty useless, so
 this might be deemed a valuable sanity check.  However, I'm reluctant
 to leave it in there, because someone could change their config to
 this state, pg_ctl reload, see everything working, and then later stop
 the cluster and be unable to start it back up again.  Since most
 people don't shut their database systems down very often, they might
 not discover that they have an invalid config until much later.  I
 think it's probably not a good idea to have configs that are valid on
 reload but prevent startup, so I'm inclined to either remove this
 check altogether or downgrade it to a warning.

Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comments on SQL/Med objects

2011-03-23 Thread Robert Haas
On Tue, Mar 22, 2011 at 6:23 PM, Guillaume Lelarge
guilla...@lelarge.info wrote:
 While working on adding support for SQL/Med objects to pgAdmin, I'm
 quite surprised to see there is no way to add comments to SQL/Med
 objects. Is this on purpose or is it just something that was simply missed?

I think it's an oversight.  We should probably fix this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Simon Riggs
On Wed, Mar 23, 2011 at 3:35 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 8:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 3:52 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I agree to get rid of write_location.

 No, don't remove it.

 We seem to be just looking for things to tweak without any purpose.
 Removing this adds nothing for us.

 We will have the column in the future, it is there now, so leave it.

 Well then can we revert the part of your patch that causes it to not
 actually work any more?

 Specifically, if we're not going to remove write location, then I
 think we need to apply something like the attached.

The protocol supports different write/fsync values, so the view should
display them.
We don't know what the standby end will be doing with the data in all cases.

For the main server, making the additional change will just decrease
performance, for no benefit.

In the future we would have a parameter that says how often we send
replies, but there's no point having a parameter if there is only one
meaningful value for standby servers currently.

Please leave this as it is now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Specifically, if we're not going to remove write location, then I
 think we need to apply something like the attached.

   while (walrcv_receive(0, type, buf, len))
   XLogWalRcvProcessMsg(type, buf, len);
 
 + /* Let the master know that we received some data. */
 + XLogWalRcvSendReply();

What if we didn't actually receive any new data?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comments on SQL/Med objects

2011-03-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Mar 22, 2011 at 6:23 PM, Guillaume Lelarge
 guilla...@lelarge.info wrote:
 While working on adding support for SQL/Med objects to pgAdmin, I'm
 quite surprised to see there is no way to add comments to SQL/Med
 objects. Is this on purpose or is it just something that was simply missed?

 I think it's an oversight.  We should probably fix this.

Yeah, I had a private TODO about that.  I'd like to see if we can
refactor the grammar to eliminate some of the duplication there
as well as the potential for oversights of this sort.  I believe
that USER MAPPINGs are missing from ObjectType as well as a bunch
of other basic places ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comments on SQL/Med objects

2011-03-23 Thread Guillaume Lelarge
Le 23/03/2011 17:53, Tom Lane a écrit :
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Mar 22, 2011 at 6:23 PM, Guillaume Lelarge
 guilla...@lelarge.info wrote:
 While working on adding support for SQL/Med objects to pgAdmin, I'm
 quite surprised to see there is no way to add comments to SQL/Med
 objects. Is this on purpose or is it just something that was simply missed?
 
 I think it's an oversight.  We should probably fix this.
 
 Yeah, I had a private TODO about that.  I'd like to see if we can
 refactor the grammar to eliminate some of the duplication there
 as well as the potential for oversights of this sort.  I believe
 that USER MAPPINGs are missing from ObjectType as well as a bunch
 of other basic places ...
 

OK, great. Thanks for your answers.


-- 
Guillaume
 http://www.postgresql.fr
 http://dalibo.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Specifically, if we're not going to remove write location, then I
 think we need to apply something like the attached.

 The protocol supports different write/fsync values, so the view should
 display them.

That's exactly the point.  Currently, we have a protocol that supports
different write and fsync values, but the code as written does not
actually ever send a reply at any time when the two values can ever be
different.  So there is no point in sending both of them.  The write
location is completely redundant with the fsync location and therefore
completely useless.  We shouldn't bother sending the value twice, or
displaying it twice, if it's absolutely 100% guaranteed to be
identical in every case.

The point of the patch that I posted is that it restores the previous
behavior, where we send an update before flushing WAL and again after
flushing WAL.  If we do that, then the write location can be ahead of
the flush location when we've written but not flushed.  If we don't do
that, and only send replies after flushing everything, then the two
fields are perforce always the same on the master.  I don't see that
as being a useful behavior, and in fact I think it could be quite
confusing.  Someone might assume that if we bother to expose both a
write_location and a flush_location, they are somehow different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 12:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Specifically, if we're not going to remove write location, then I
 think we need to apply something like the attached.

                       while (walrcv_receive(0, type, buf, len))
                               XLogWalRcvProcessMsg(type, buf, len);

 +                     /* Let the master know that we received some data. */
 +                     XLogWalRcvSendReply();

 What if we didn't actually receive any new data?

The portion of the code immediately preceding what's included in the
diff guards against that, and there is a second guard in
XLogWalRcvSendReply().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Jim Nasby
On Mar 22, 2011, at 2:53 PM, Robert Haas wrote:
 On Tue, Mar 22, 2011 at 11:24 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 9:19 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 11:14 AM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Maybe the thing to focus on first is the oft-discussed benchmark
 farm (similar to the build farm), with a good mix of loads, so
 that the impact of changes can be better tracked for multiple
 workloads on a variety of platforms and configurations.  Without
 something like that it is very hard to justify the added complexity
 of an idea like this in terms of the performance benefit gained.
 
 A related area that could use some looking at is why performance tops
 out at shared_buffers ~8GB and starts to fall thereafter.
 
 Under what circumstances does this happen?  Can a simple pgbench -S
 with a large scaling factor elicit this behavior?
 
 To be honest, I'm mostly just reporting what I've heard Greg Smith say
 on this topic.   I don't have any machine with that kind of RAM.

When we started using 192G servers we tried switching our largest OLTP database 
(would have been about 1.2TB at the time) from 8GB shared buffers to 28GB. 
Performance went down enough to notice; I don't have any solid metrics, but I'd 
ballpark it at 10-15%.

One thing that I've always wondered about is the logic of having backends run 
the clocksweep on a normal basis. OS's that use clock-sweep have a dedicated 
process to run the clock in the background, with the intent of keeping X amount 
of pages on the free list. We actually have most of the mechanisms to do that, 
we just don't have the added process. I believe bg_writer was intended to 
handle that, but in reality I don't think it actually manages to keep much of 
anything on the free list. Once we have a performance testing environment I'd 
be interested to test a modified version that includes a dedicated background 
clock sweep process that strives to keep X amount of buffers on the free list.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \dt and table size

2011-03-23 Thread Susanne Ebrecht

Hello Bernd,

On 21.03.2011 18:44, Bernd Helmle wrote:


Attached minor patch extends \dt to use pg_table_size() starting with 
PostgreSQL 9.0, not sure if we backport such changes though. It would 
be interesting for 9.1, however. 


As I already told you:

I tested and it worked.
The code looks correct to me.

You just should send the code to a beauty farm - the wrinkles (braces) 
could get placed better also it could be more. :)


Susanne

--
Susanne Ebrecht - 2ndQuadrant
PostgreSQL Development, 24x7 Support, Training and Services
www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 2:43 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Mar 23, 2011 at 6:22 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Specifically, if we're not going to remove write location, then I
 think we need to apply something like the attached.

 The protocol supports different write/fsync values, so the view should
 display them.

 That's exactly the point.

 No its not.

 Currently, we have a protocol that supports
 different write and fsync values, but the code as written does not
 actually ever send a reply at any time when the two values can ever be
 different.  So there is no point in sending both of them.  The write
 location is completely redundant with the fsync location and therefore
 completely useless.  We shouldn't bother sending the value twice, or
 displaying it twice, if it's absolutely 100% guaranteed to be
 identical in every case.

 As of 9.1, we now support other tools that use the protocol, so you
 cannot assume you know what is being sent, just because one sender has
 certain characteristics.

Oh, really?  Is this strictly hypothetical or is such a beast
planned/already in existence?

I'm just afraid this is going to be confusing to users who will expect
it to do something that it doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Simon Riggs
On Wed, Mar 23, 2011 at 7:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 23, 2011 at 2:43 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Mar 23, 2011 at 6:22 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Specifically, if we're not going to remove write location, then I
 think we need to apply something like the attached.

 The protocol supports different write/fsync values, so the view should
 display them.

 That's exactly the point.

 No its not.

 Currently, we have a protocol that supports
 different write and fsync values, but the code as written does not
 actually ever send a reply at any time when the two values can ever be
 different.  So there is no point in sending both of them.  The write
 location is completely redundant with the fsync location and therefore
 completely useless.  We shouldn't bother sending the value twice, or
 displaying it twice, if it's absolutely 100% guaranteed to be
 identical in every case.

 As of 9.1, we now support other tools that use the protocol, so you
 cannot assume you know what is being sent, just because one sender has
 certain characteristics.

 Oh, really?  Is this strictly hypothetical or is such a beast
 planned/already in existence?

Ask Magnus.

In any case, that's not the only argument for keeping it. We introduce
the view in this release and I would like it to stay the same from
now, since we know we will need that info later.

No more minor tweaks, please.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 1:53 PM, Jim Nasby j...@nasby.net wrote:
 When we started using 192G servers we tried switching our largest OLTP 
 database (would have been about 1.2TB at the time) from 8GB shared buffers to 
 28GB. Performance went down enough to notice; I don't have any solid metrics, 
 but I'd ballpark it at 10-15%.

 One thing that I've always wondered about is the logic of having backends run 
 the clocksweep on a normal basis. OS's that use clock-sweep have a dedicated 
 process to run the clock in the background, with the intent of keeping X 
 amount of pages on the free list. We actually have most of the mechanisms to 
 do that, we just don't have the added process. I believe bg_writer was 
 intended to handle that, but in reality I don't think it actually manages to 
 keep much of anything on the free list. Once we have a performance testing 
 environment I'd be interested to test a modified version that includes a 
 dedicated background clock sweep process that strives to keep X amount of 
 buffers on the free list.

It looks like the only way anything can ever get put on the free list
right now is if a relation or database is dropped.  That doesn't seem
too good.  I wonder if the background writer shouldn't be trying to
maintain the free list.  That is, perhaps BgBufferSync() should notice
when the number of free buffers drops below some threshold, and run
the clock sweep enough to get it back up to that threshold.

On a related note, I've been thinking about whether we could make
bgwriter_delay adaptively self-tuning.  If we notice that we
overslept, we don't sleep as long the next time; if not much happens
while we sleep, we sleep longer the next time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Robert Haas
On Wed, Mar 23, 2011 at 3:33 PM, Simon Riggs si...@2ndquadrant.com wrote:
 In any case, that's not the only argument for keeping it. We introduce
 the view in this release and I would like it to stay the same from
 now, since we know we will need that info later.

At least as I understand it, it's not our project policy to carry
around code that doesn't accomplish anything useful.  I have no
objection to keeping the field; I simply think that if we're going to
have it, we should make it work, as in fact it did before you changed
it without discussion.  You haven't offered any evidence at all that
it introduces any kind of a performance regression AT ALL, much less
that such any such regression can't be trivially patched around by
making SyncRepReleaseWaiters exit quickly if the flush LSN hasn't
advanced.  The onus is as much on you to justify the change as it is
on me to justify changing it back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \dt and table size

2011-03-23 Thread Robert Haas
On Mon, Mar 21, 2011 at 1:44 PM, Bernd Helmle maili...@oopsware.de wrote:
 It stroke me today again, that \dt+ isn't displaying the acurate table size
 for tables, since it uses pg_relation_size() till now. With having
 pg_table_size() since PostgreSQL 9.0 available, i believe it would be more
 useful to have the total acquired storage displayed, including implicit
 objects (the mentioned case where it was not very useful atm was a table
 with a big TOAST table).

I guess the threshold question for this patch is whether
pg_table_size() is a more accurate table size or just a different
one.  It could possible be confusing to display one value in that
column when the server is = 9.0 and the client is = 9.1, and a
different value when the server is  9.0 or the client is  9.1.

On the other hand, it's clear that there are several people in favor
of this change, so maybe we should just go ahead and do it.  Not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Greg Stark
On Wed, Mar 23, 2011 at 8:00 PM, Robert Haas robertmh...@gmail.com wrote:
 It looks like the only way anything can ever get put on the free list
 right now is if a relation or database is dropped.  That doesn't seem
 too good.  I wonder if the background writer shouldn't be trying to
 maintain the free list.  That is, perhaps BgBufferSync() should notice
 when the number of free buffers drops below some threshold, and run
 the clock sweep enough to get it back up to that threshold.


I think this is just a terminology discrepancy. In postgres the free
list is only used for buffers that contain no useful data at all. The
only time there are buffers on the free list is at startup or if a
relation or database is dropped.

Most of the time blocks are read into buffers that already contain
other data. Candidate buffers to evict are buffers that have been used
least recently. That's what the clock sweep implements.

What the bgwriter's responsible for is looking at the buffers *ahead*
of the clock sweep and flushing them to disk. They stay in ram and
don't go on the free list, all that changes is that they're clean and
therefore can be reused without having to do any i/o.

I'm a bit skeptical that this works because as soon as bgwriter
saturates the i/o the os will throttle the rate at which it can write.
When that happens even a few dozens of milliseconds will be plenty to
allow the purely user-space processes consuming the buffers to catch
up instantly.

But Greg Smith has done a lot of work tuning the bgwriter so that it
is at least useful in some circumstances. I could well see it being
useful for systems where latency matters and the i/o is not saturated.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-23 Thread Simon Riggs
On Wed, Mar 23, 2011 at 8:20 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 23, 2011 at 3:33 PM, Simon Riggs si...@2ndquadrant.com wrote:
 In any case, that's not the only argument for keeping it. We introduce
 the view in this release and I would like it to stay the same from
 now, since we know we will need that info later.

 At least as I understand it, it's not our project policy to carry
 around code that doesn't accomplish anything useful.  I have no
 objection to keeping the field; I simply think that if we're going to
 have it, we should make it work, as in fact it did before you changed
 it without discussion.  You haven't offered any evidence at all that
 it introduces any kind of a performance regression AT ALL, much less
 that such any such regression can't be trivially patched around by
 making SyncRepReleaseWaiters exit quickly if the flush LSN hasn't
 advanced.  The onus is as much on you to justify the change as it is
 on me to justify changing it back.

What a stupid conversation.

There's no onus on me to have to keep justifying to you why the code
is the way it is, but I do.

If you want to make a change that I already know reduces performance,
you have to have a good reason. So far, you don't.

Stop fussing and wrap the release.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \dt and table size

2011-03-23 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mié mar 23 17:24:59 -0300 2011:
 On Mon, Mar 21, 2011 at 1:44 PM, Bernd Helmle maili...@oopsware.de wrote:
  It stroke me today again, that \dt+ isn't displaying the acurate table size
  for tables, since it uses pg_relation_size() till now. With having
  pg_table_size() since PostgreSQL 9.0 available, i believe it would be more
  useful to have the total acquired storage displayed, including implicit
  objects (the mentioned case where it was not very useful atm was a table
  with a big TOAST table).
 
 I guess the threshold question for this patch is whether
 pg_table_size() is a more accurate table size or just a different
 one.

Not including the toast table and index in the size is just plain wrong.
Reporting the size without the toast objects is an implementation
artifact that should not be done unless explicitely requested.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Radosław Smogura
Greg Stark gsst...@mit.edu Wednesday 23 March 2011 21:30:04
 On Wed, Mar 23, 2011 at 8:00 PM, Robert Haas robertmh...@gmail.com wrote:
  It looks like the only way anything can ever get put on the free list
  right now is if a relation or database is dropped.  That doesn't seem
  too good.  I wonder if the background writer shouldn't be trying to
  maintain the free list.  That is, perhaps BgBufferSync() should notice
  when the number of free buffers drops below some threshold, and run
  the clock sweep enough to get it back up to that threshold.
 
 I think this is just a terminology discrepancy. In postgres the free
 list is only used for buffers that contain no useful data at all. The
 only time there are buffers on the free list is at startup or if a
 relation or database is dropped.
 
 Most of the time blocks are read into buffers that already contain
 other data. Candidate buffers to evict are buffers that have been used
 least recently. That's what the clock sweep implements.
 
 What the bgwriter's responsible for is looking at the buffers *ahead*
 of the clock sweep and flushing them to disk. They stay in ram and
 don't go on the free list, all that changes is that they're clean and
 therefore can be reused without having to do any i/o.
 
 I'm a bit skeptical that this works because as soon as bgwriter
 saturates the i/o the os will throttle the rate at which it can write.
 When that happens even a few dozens of milliseconds will be plenty to
 allow the purely user-space processes consuming the buffers to catch
 up instantly.
 
 But Greg Smith has done a lot of work tuning the bgwriter so that it
 is at least useful in some circumstances. I could well see it being
 useful for systems where latency matters and the i/o is not saturated.

Freelist is almost useless under normal operations, but it's only one check if 
it's empty or not, which could be optimized by checking (... -1), or !(...  
0)

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \dt and table size

2011-03-23 Thread Pavel Stehule
2011/3/23 Alvaro Herrera alvhe...@commandprompt.com:
 Excerpts from Robert Haas's message of mié mar 23 17:24:59 -0300 2011:
 On Mon, Mar 21, 2011 at 1:44 PM, Bernd Helmle maili...@oopsware.de wrote:
  It stroke me today again, that \dt+ isn't displaying the acurate table size
  for tables, since it uses pg_relation_size() till now. With having
  pg_table_size() since PostgreSQL 9.0 available, i believe it would be more
  useful to have the total acquired storage displayed, including implicit
  objects (the mentioned case where it was not very useful atm was a table
  with a big TOAST table).

 I guess the threshold question for this patch is whether
 pg_table_size() is a more accurate table size or just a different
 one.

 Not including the toast table and index in the size is just plain wrong.
 Reporting the size without the toast objects is an implementation
 artifact that should not be done unless explicitely requested.

+1

can we enhance a detail for table and show more accurate numbers?

table size: xxx
toast size: xxx
indexes size: xxx

Regards

Pavel Stehule



 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PG Session #2 : Call For Papers

2011-03-23 Thread damien clochard
Hi,

The first PostgreSQL Session organized by Dalibo last february was
a big success. More than 80 public and private participants gathered in
Paris.

Dalibo and Oslandia want to carry on this success and set up a
new conference day dedicated to PostGIS, on June 23d in Paris.

http://www.postgresql-sessions.org/en/2/

The call for paper for this second session is now open. Here is some
expected topics (non-exhaustive list) :

  * Feedback on information systems architectures using PostGIS in
specific contexts (exotic constraints, high volumes…)
  * decision process of GIS database
  * migration from another spatial DB
  * PostGIS 2.0 new features, WKT Raster
  * Interoperability between PostGIS 2.0 and other softwares
  * Link between PostgreSQL and PostGIS, and planned evolution
  * etc.

Talks duration will be 45', including a 15' questions and answers session.

Thank you for sending the following elements at contact@postgresql-
sessions.org before april 22d :
  * Presentation title
  * Name of speaker(s)
  * Summary of presentation (100-250 words)

For any question, do not hesitate to contact us :
  cont...@postgresql-sessions.org.

If you have any friends or colleagues that you think would be interested
in giving a talk, please forward this message to them!

PS : Slides from the first PG Session are available at :

http://www.postgresql-sessions.org/en/1/


-- 
damien clochard
dalibo.com | dalibo.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Lack of post creation hook on extension

2011-03-23 Thread Kohei KaiGai
I found a problem that extension.c does not invoke post-creation hook
on its creation time, although no module supports to assign security
label on extension objects right now.

The attached patch tries to fix it.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


pgsql-extension-hook.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 It looks like the only way anything can ever get put on the free list
 right now is if a relation or database is dropped.  That doesn't seem
 too good.

Why not?  AIUI the free list is only for buffers that are totally dead,
ie contain no info that's possibly of interest to anybody.  It is *not*
meant to substitute for running the clock sweep when you have to discard
a live buffer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] missing history for pg_exec_query_string()

2011-03-23 Thread Jaime Casanova
Hi,

I'm looking for the history of pg_exec_query_string() and found that
it dissapear in 7.4, so i tried to look at git log to find out if it
was renamed or removed completely and found only this 3 commits:
http://git.postgresql.org/gitweb?p=postgresql.gita=searchh=cd48aa0e9636c6225d34f366a9d71e7fc118c9f3st=commits=pg_exec_query_string,
and none of them removes or renames it...

i'm pretty sure exec_simple_query() is what we used to call
pg_exec_query_string() but i had to look for functions with similar
names and then compare the code, not very elegant :(

any idea what happens to this part of the history?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing history for pg_exec_query_string()

2011-03-23 Thread Cédric Villemain
2011/3/23 Jaime Casanova ja...@2ndquadrant.com:
 Hi,

 I'm looking for the history of pg_exec_query_string() and found that
 it dissapear in 7.4, so i tried to look at git log to find out if it
 was renamed or removed completely and found only this 3 commits:
 http://git.postgresql.org/gitweb?p=postgresql.gita=searchh=cd48aa0e9636c6225d34f366a9d71e7fc118c9f3st=commits=pg_exec_query_string,
 and none of them removes or renames it...

 i'm pretty sure exec_simple_query() is what we used to call
 pg_exec_query_string() but i had to look for functions with similar
 names and then compare the code, not very elegant :(

 any idea what happens to this part of the history?

This :
git log --pretty=oneline -S'pg_exec_query_string' origin/REL7_4_STABLE
git show  de28dc9a04c4df5d711815b7a518501b43535a26

give me the answer :

commit de28dc9a04c4df5d711815b7a518501b43535a26
Author: Tom Lane t...@sss.pgh.pa.us
Date:   Fri May 2 20:54:36 2003 +

Portal and memory management infrastructure for extended query protocol.
Both plannable queries and utility commands are now always executed
within Portals, which have been revamped so that they can handle the
load (they used to be good only for single SELECT queries).  Restructure
code to push command-completion-tag selection logic out of postgres.c,
so that it won't have to be duplicated between simple and extended queries.
initdb forced due to addition of a field to Query nodes.


wonderful git :)


 --
 Jaime Casanova         www.2ndQuadrant.com
 Professional PostgreSQL: Soporte y capacitación de PostgreSQL

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Query to generate CREATE INDEX statement from

2011-03-23 Thread Andrew Hammond
I'm building some partitioning support functions. I'm working on writing one
called clone_indexes_to_partition right now. The idea is to take all the
indexes applied to the parent and create a matching index on the child. Is
there existing code that generates a CREATE INDEX statement given an
indexrelid?

Andrew


Re: [HACKERS] missing history for pg_exec_query_string()

2011-03-23 Thread Jaime Casanova
On Wed, Mar 23, 2011 at 6:50 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:

 This :
 git log --pretty=oneline -S'pg_exec_query_string' origin/REL7_4_STABLE
 git show  de28dc9a04c4df5d711815b7a518501b43535a26

 give me the answer :


ah! ok, so the problema was the way i was asking the info to git...
that -S option looks like a life saver, thanks!

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query to generate CREATE INDEX statement from

2011-03-23 Thread Cédric Villemain
2011/3/24 Andrew Hammond andrew.george.hamm...@gmail.com:
 I'm building some partitioning support functions. I'm working on writing one
 called clone_indexes_to_partition right now. The idea is to take all the
 indexes applied to the parent and create a matching index on the child. Is
 there existing code that generates a CREATE INDEX statement given
 an indexrelid?

Yes it is :
http://www.postgresql.org/docs/9.0/static/functions-info.html
pg_get_indexdef(index_oid)  get CREATE INDEX command for index

 Andrew



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query to generate CREATE INDEX statement from

2011-03-23 Thread Tom Lane
Andrew Hammond andrew.george.hamm...@gmail.com writes:
 I'm building some partitioning support functions. I'm working on writing one
 called clone_indexes_to_partition right now. The idea is to take all the
 indexes applied to the parent and create a matching index on the child. Is
 there existing code that generates a CREATE INDEX statement given an
 indexrelid?

pg_get_indexdef

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Gokulakannan Somasundaram
Yeah. i looked at it. I don't think it addresses the problem raised here.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php

http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.phpOr may be
i am missing something.

Thanks.

On Wed, Mar 23, 2011 at 7:54 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Mar 23, 2011 at 2:29 AM, Gokulakannan Somasundaram
 gokul...@gmail.com wrote:
  All operations that clear the bit area are already WAL-logged.
 
  Is it the case with visibility map also?
  Thanks.

 Yes.  Look at the comment that the patch removes.  That describes the
 problem being fixed.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company