Re: [HACKERS] Page Checksums

2012-01-10 Thread Heikki Linnakangas

On 10.01.2012 02:12, Jim Nasby wrote:

Filesystem CRCs very likely will not happen to data that's in the cache. For 
some users, that's a huge amount of data to leave un-protected.


You can repeat that argument ad infinitum. Even if the CRC covers all 
the pages in the OS buffer cache, it still doesn't cover the pages in 
the shared_buffers, CPU caches, in-transit from one memory bank to 
another etc. You have to draw the line somewhere, and it seems 
reasonable to draw it where the data moves between long-term storage, 
ie. disk, and RAM.



Filesystem bugs do happen... though presumably most of those would be caught by 
the filesystem's CRC check... but you never know!


Yeah. At some point we have to just have faith on the underlying system. 
It's reasonable to provide protection or make recovery easier from bugs 
or hardware faults that happen fairly often in the real world, but a 
can't-trust-no-one attitude is not very helpful.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sending notifications from the master to the standby

2012-01-10 Thread Simon Riggs
On Tue, Jan 10, 2012 at 5:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Joachim Wieland j...@mcknight.de writes:
 [ send NOTIFYs to slaves by means of: ]

Good idea.

 I wonder whether it'd be practical to not involve WAL per se in this
 at all, but to transmit NOTIFY messages by having walsender processes
 follow the notify stream (as though they were listeners) and send the
 notify traffic as a separate message stream interleaved with the WAL
 traffic.  We already have, as of a few days ago, the concept of
 additional traffic in the walsender stream besides the WAL data itself,
 so adding notify traffic as another message type should be
 straightforward.

Also good idea.

 It might be a bit tricky to get walreceivers to inject
 the data into the slave-side ring buffer at the right time, ie, not
 until after the commit a given message describes has been replayed;
 but I don't immediately see a reason to think that's infeasible.

When transaction commits it would use full-size commit records and set
a (new) flag in xl_xact_commit.xinfo to show the commit is paired with
notify traffic.

Get messages in walreceiver.c XLogWalRcvProcessMsg() and put them in a
shared hash table. Messages would need to contain xid of notifying
transaction and other info needed for LISTEN.

When we hit xact.c xact_redo_commit() on standby we'd check for
messages in the hash table if the notify flag is set and execute the
normal notify code as if the NOTIFY had run locally on the standby. We
can sweep the hash table clean of any old messages each time we run
ProcArrayApplyRecoveryInfo()

Add new message type to walprotocol.h. Message code 'L' appears to be
available.

Suggest we add something to initial handshake from standby to say
please send me notify traffic, which we can link to a parameter that
defines size of standby_notify_buffer. We don't want all standbys to
receive such traffic unless they really want it and pg_basebackup
probably doesn't want it either.

If you wanted to get really fancy you could send only some of the
traffic to each standby based on a hash or roundrobin algorithm, so we
can spread the listeners across multiple standbys.

I'll be your reviewer, if you want.

 Going in this direction would mean that slave-side LISTEN only works
 when using walsender/walreceiver, and not with old-style log shipping.
 But personally I don't see a problem with that.  If you're trying to
 LISTEN you probably want pretty up-to-date data anyway.

Which fits the expected use case also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2012-01-10 Thread Simon Riggs
On Tue, Jan 10, 2012 at 8:04 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 10.01.2012 02:12, Jim Nasby wrote:

 Filesystem CRCs very likely will not happen to data that's in the cache.
 For some users, that's a huge amount of data to leave un-protected.


 You can repeat that argument ad infinitum. Even if the CRC covers all the
 pages in the OS buffer cache, it still doesn't cover the pages in the
 shared_buffers, CPU caches, in-transit from one memory bank to another etc.
 You have to draw the line somewhere, and it seems reasonable to draw it
 where the data moves between long-term storage, ie. disk, and RAM.

We protect each change with a CRC when we write WAL, so doing the same
thing doesn't sound entirely unreasonable, especially if your database
fits in RAM and we aren't likely to be doing I/O anytime soon. The
long term storage argument may no longer apply in a world with very
large memory.

The question is, when exactly would we check the checksum? When we
lock the block, when we pin it? We certainly can't do it on every
access to the block since we don't even track where that happens in
the code.

I think we could add an option to check the checksum immediately after
we pin a block for the first time but it would be very expensive and
sounds like we're re-inventing hardware or OS features again. Work on
50% performance drain, as an estimate.

That is a level of protection no other DBMS offers, so that is either
an advantage or a warning. Jim, if you want this, please do the
research and work out what the probability of losing shared buffer
data in your ECC RAM really is so we are doing it for quantifiable
reasons (via old Google memory academic paper) and to verify that the
cost/benefit means you would actually use it if we built it. Research
into requirements is at least as important and time consuming as
research on possible designs.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generate call graphs in run-time

2012-01-10 Thread Joel Jacobson
Is this only to print out the stack upon errors? Looks like the stack is in
the variable error_context_stack. Is it always available containing all the
parent functions, even when there is no error? Can I reach it from within
pgstat.c?

2012/1/10 Jim Nasby j...@nasby.net

 On Jan 9, 2012, at 2:08 PM, Joel Jacobson wrote:
  Generates call graphs of function calls within a transaction in run-time.

 Related to this... we had Command Prompt write a function for us that
 would spit out the complete call-graph of the current call stack whenever
 it was called. Alvaro didn't need to add any hooks to the backend to
 accomplish this, so it would seem that the call stack is already available.
 That might simplify what you're doing.

 I don't see this posted on pgFoundry yet, so I've attached it.
 --
 Jim C. Nasby, Database Architect   j...@nasby.net
 512.569.9461 (cell) http://jim.nasby.net




-- 
Joel Jacobson
Trustly
+46703603801
https://trustly.com


Re: [HACKERS] LWLOCK_STATS

2012-01-10 Thread Simon Riggs
On Tue, Jan 10, 2012 at 12:24 AM, Jim Nasby j...@nasby.net wrote:

 IIRC, pg_bench is *extremely* write-heavy. There's probably not that many 
 systems that operate that way. I suspect that most OLTP systems read more 
 than they write, and some probably have as much as a 10-1 ratio.

IMHO the main PostgreSQL design objective is doing a flexible, general
purpose 100% write workload. Which is why Hot Standby and
LISTEN/NOTIFY are so important as mechanisms for offloading read
traffic to other places, so we can scale the total solution beyond 1
node without giving up the power of SQL.

So benchmarking write-heavy workloads and separately benchmarking
read-only workloads is more representative.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2012-01-10 Thread Benedikt Grundmann
On 10/01/12 09:07, Simon Riggs wrote:
  You can repeat that argument ad infinitum. Even if the CRC covers all the
  pages in the OS buffer cache, it still doesn't cover the pages in the
  shared_buffers, CPU caches, in-transit from one memory bank to another etc.
  You have to draw the line somewhere, and it seems reasonable to draw it
  where the data moves between long-term storage, ie. disk, and RAM.
 
 We protect each change with a CRC when we write WAL, so doing the same
 thing doesn't sound entirely unreasonable, especially if your database
 fits in RAM and we aren't likely to be doing I/O anytime soon. The
 long term storage argument may no longer apply in a world with very
 large memory.
 
I'm not so sure about that.  The experience we have is that storage
and memory doesn't grow as fast as demand.  Maybe we are in a minority 
but at Jane Street memory size  database size is sadly true for most 
of the important databases.

Concrete the two most important databases are 

715 GB

and

473 GB 

in size (the second used to be much closer to the first one in size but
we recently archived a lot of data).

In both databases there is a small set of tables that use the majority of
the disk space.  Those tables are also the most used tables.  Typically
the size of one of those tables is between 1-3x size of memory.  And the
cumulative size of all indices on the table is normally roughly the same
size as the table.

Cheers,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Allow breaking out of hung connection attempts

2012-01-10 Thread Heikki Linnakangas

On 09.01.2012 15:49, Ryan Kelly wrote:

On Mon, Jan 09, 2012 at 10:35:50AM +0200, Heikki Linnakangas wrote:

That assumes that it's safe to longjmp out of PQconnectdbParams at
any instant. It's not.

I'm guessing because it could result in a resource leak?


Yes, and other unfinished business, too.


I think you'd need to use the asynchronous connection functions
PQconnectStartParams() and PQconnectPoll(), and select().

New patch attached.


Thanks, some comments:

* Why do you need the timeout?

* If a SIGINT arrives before you set sigint_interrupt_enabled, it just 
sets cancel_pressed but doesn't jump out of the connection attempt. You 
need to explicitly check cancel_pressed after setting 
sigint_interrupt_enabled to close that race condition.


* You have to reinitialize the fd mask with FD_ZERO/SET before each call 
to select(). select() modifies the mask.


* In case of PGRES_POLLING_WRITING, you have to wait until the socket 
becomes writable, not readable.


Attached is a new version that fixes those.

There's one caveat in the libpq docs about PQconnectStart/PQconnectPoll:


The connect_timeout connection parameter is ignored when using PQconnectPoll; 
it is the application's responsibility to decide whether an excessive amount of 
time has elapsed. Otherwise, PQconnectStart followed by a PQconnectPoll loop is 
equivalent to PQconnectdb.


So after this patch, connect_timeout will be ignored in \connect. That 
probably needs to be fixed. You could incorporate a timeout fairly 
easily into the select() calls, but unfortunately there's no easy way to 
get the connect_timeout value. You could to parse the connection string 
the user gave with PQconninfoParse(), but the effective timeout setting 
could come from a configuration file, too.


Not sure what to do about that. If there was a PQconnectTimeout(conn) 
function, similar to PQuser(conn), PQhost(conn) et al, you could use 
that. Maybe we should add that, or even better, a generic function that 
could be used to return not just connect_timeout, but all the connection 
options in effect in a connection.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 69fac83..135d022 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1515,8 +1515,8 @@ param_is_newly_set(const char *old_val, const char *new_val)
 static bool
 do_connect(char *dbname, char *user, char *host, char *port)
 {
-	PGconn	   *o_conn = pset.db,
-			   *n_conn;
+	PGconn	   *o_conn = pset.db;
+	PGconn	   *n_conn = NULL;
 	char	   *password = NULL;
 
 	if (!dbname)
@@ -1570,14 +1570,67 @@ do_connect(char *dbname, char *user, char *host, char *port)
 		keywords[7] = NULL;
 		values[7] = NULL;
 
-		n_conn = PQconnectdbParams(keywords, values, true);
+		/* attempt connection asynchronously */
+		n_conn = PQconnectStartParams(keywords, values, true);
+
+		if (sigsetjmp(sigint_interrupt_jmp, 1) != 0)
+		{
+			/* interrupted during connection attempt */
+			PQfinish(n_conn);
+			n_conn = NULL;
+		}
+		else
+		{
+			while (true)
+			{
+int			poll_res;
+int			rc;
+fd_set		read_mask,
+			write_mask;
+
+poll_res = PQconnectPoll(n_conn);
+if (poll_res == PGRES_POLLING_OK ||
+	poll_res == PGRES_POLLING_FAILED)
+{
+	break;
+}
+
+FD_ZERO(read_mask);
+FD_ZERO(write_mask);
+
+if (poll_res == PGRES_POLLING_READING)
+	FD_SET(PQsocket(n_conn), read_mask);
+if (poll_res == PGRES_POLLING_WRITING)
+	FD_SET(PQsocket(n_conn), write_mask);
+
+sigint_interrupt_enabled = true;
+if (cancel_pressed)
+{
+	/* interrupted during connection attempt */
+	PQfinish(n_conn);
+	n_conn = NULL;
+	sigint_interrupt_enabled = false;
+	break;
+}
+rc = select(PQsocket(n_conn) + 1,
+			read_mask, write_mask, NULL,
+			NULL);
+sigint_interrupt_enabled = false;
+
+if (rc  0  errno != EINTR)
+	break;
+			}
+		}
 
 		free(keywords);
 		free(values);
 
 		/* We can immediately discard the password -- no longer needed */
 		if (password)
+		{
 			free(password);
+			password = NULL;
+		}
 
 		if (PQstatus(n_conn) == CONNECTION_OK)
 			break;
@@ -1586,7 +1639,7 @@ do_connect(char *dbname, char *user, char *host, char *port)
 		 * Connection attempt failed; either retry the connection attempt with
 		 * a new password, or give up.
 		 */
-		if (!password  PQconnectionNeedsPassword(n_conn)  pset.getPassword != TRI_NO)
+		if (PQconnectionNeedsPassword(n_conn)  pset.getPassword != TRI_NO)
 		{
 			PQfinish(n_conn);
 			password = prompt_for_password(user);
@@ -1600,7 +1653,8 @@ do_connect(char *dbname, char *user, char *host, char *port)
 		 */
 		if (pset.cur_cmd_interactive)
 		{
-			psql_error(%s, PQerrorMessage(n_conn));
+			if (n_conn)
+psql_error(%s, PQerrorMessage(n_conn));
 
 			/* pset.db is left unmodified */
 			if (o_conn)
@@ -1608,7 +1662,9 @@ do_connect(char *dbname, char *user, char *host, char 

[HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-01-10 Thread Kohei KaiGai
This patch adds a new GUC sepgsql.client_label that allows client
process to switch its privileges into another one, as long as the
system security policy admits this transition.
Because of this feature, I ported two permissions from process class
of SELinux; setcurrent and dyntransition. The first one checks
whether the client has a right to switch its privilege. And the other
one checks a particular transition path from X to Y.

This feature might seem to break assumption of the sepgsql's security
model. However, single-directed domain transition from
bigger-privileges to smaller-privileged domain by users' operation is
also supported on operating system, and useful feature to restrict
applications capability at beginning of the session.

A few weeks ago, I got a requirement from Joshua Brindle. He is
working for Web-application that uses CAC (Common Access Card) for its
authentication, and wanted to integrate its security credential and
security label of selinux/sepgsql.
One problem was the system environment unavailable to use
labeled-networking (IPsec), thus, it was not an option to switch the
security label of processes on web-server side. An other solution is
to port dynamic-transition feature into sepgsql, as an analogy of
operating system.

An expected scenario is below:
The web-server is running with WEBSERV domain. It is allowed to
connect to PostgreSQL, and also allowed to invoke an trusted-procedure
that takes an argument of security-credential within CAC, but, nothing
else are allowed.
The trusted-procedure is allowed to reference a table between
security-credential and security-label to be assigned on, then it
switches the security label of client into CLIENT_n.
The CLIENT_n shall be allowed to access tables, functions and others
according to the security policy, and also allowed to reset
sepgsql.security_label to revert WEBSERV. However, he is not
available to switch other domain without security-credential stored
within CAC card.

I and Joshua agreed this scenario is reasonable and secure.
So, we'd like to suggest this new feature towards v9.2 timeline.

Thanks,

[*1] CAC - Common Access Card
http://en.wikipedia.org/wiki/Common_Access_Card
-- 
KaiGai Kohei kai...@kaigai.gr.jp


pgsql-v9.2-guc-sepgsql.client_label.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sending notifications from the master to the standby

2012-01-10 Thread Joachim Wieland
On Tue, Jan 10, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So this design is non-optimal both for existing uses and for the
 proposed new uses, which means nobody will like it.  You could
 ameliorate #1 by adding a GUC that determines whether NOTIFY actually
 writes WAL, but that's pretty ugly.  In any case ISTM that problem #2
 means this design is basically broken.

I chose to do it this way because it seemed like the most natural way
to do it (which of course doesn't mean it's the best)  :-). I agree
that there should be a way to avoid the replication of the NOTIFYs.
Regarding your second point though, remember that on the master we
write notifications to the queue in pre-commit. And we also don't
interleave notifications of different transactions. So once the commit
record makes it to the standby, all the notifications are already
there, just as on the master. In a burst of notifications, both
solutions should more or less behave the same way but yes, the one
involving the WAL file would be slower as it goes to the file system
and back.

 I wonder whether it'd be practical to not involve WAL per se in this
 at all, but to transmit NOTIFY messages by having walsender processes
 follow the notify stream (as though they were listeners) and send the
 notify traffic as a separate message stream interleaved with the WAL
 traffic.

Agreed, having walsender/receiver work as NOTIFY proxies is kinda smart...


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsphere

2012-01-10 Thread Dave Cramer
Hi Oleg,


On Sun, Jan 8, 2012 at 1:19 PM, Oleg Bartunov o...@sai.msu.su wrote:
 Dave,

 The situation with pgshpere is so, that I think we need new developer, since
 Janko keep silence :)  I wrote him several time, since I wanted
 pgsphere now could benefit very much from our KNNGiST feature. This is
 number one development from my point of view. I and Teodor have no
 time to work on pgsphere, sorry. But, there are some astronomers I'm working
 with, who can take part in this. Sergey Karpov has done extensive benchmarks
 of q3c, rtree and pgsphere and found the latter still has some benefits
 in some workload, so we are interesting in development.


 Regards,
 Oleg

So where do we go from here ?


Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca




 On Fri, 6 Jan 2012, Andrew Dunstan wrote:



 On 01/06/2012 12:32 PM, Dave Cramer wrote:

 I've been asked by someone to support pgshpere.

 It would appear that the two project owners are MIA. If anyone knows
 different can they let me know ?

 Does anyone have any objection to me taking over the project?



 One of the owners is Teodor, who is a core committer ... I hope he's not
 MIA.

 cheers

 andrew





        Regards,
                Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLOCK_STATS

2012-01-10 Thread Merlin Moncure
On Tue, Jan 10, 2012 at 3:16 AM, Simon Riggs si...@2ndquadrant.com wrote:
 So benchmarking write-heavy workloads and separately benchmarking
 read-only workloads is more representative.

Absolutely.  High write activity applications are much more difficult
to optimize with simple tricks like client side caching.  Also,
storage is finally moving out of the dark ages so that high write
transaction rate servers are no longer necessarily i/o bound without
on reasonable hardware.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsphere

2012-01-10 Thread Andrew Dunstan



On 01/10/2012 09:04 AM, Dave Cramer wrote:


So where do we go from here ?



First, please note that -hackers is not the right place for this 
discussion. pgsphere is a pgfoundry project, which is not the province 
of -hackers.


As I suggested, the best solution is for Teodor to add you as a project 
admin. Oleg, could you please follow this up?


Speaking with my pgfoundry admin hat on, I am extremely reluctant to 
take control of a project away from its owners. If you don't get any 
action in a week or so, you can approach the pgfoundry admins for help. 
In the meantime you might want to fork the code onto github or bitbucket.




cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsphere

2012-01-10 Thread Oleg Bartunov
I think Dave you fork project, I suggest different name pgsphere-2 
to avoid confusion. github would be ok. Teodor and I are really 
busy guys, so I don't believe we could participate much, except

discussion and testing. We implemented KNNGiST, you add neighbourhood
search support to pgsphere :)

Oleg

On Tue, 10 Jan 2012, Dave Cramer wrote:


Hi Oleg,


On Sun, Jan 8, 2012 at 1:19 PM, Oleg Bartunov o...@sai.msu.su wrote:

Dave,

The situation with pgshpere is so, that I think we need new developer, since
Janko keep silence :)  I wrote him several time, since I wanted
pgsphere now could benefit very much from our KNNGiST feature. This is
number one development from my point of view. I and Teodor have no
time to work on pgsphere, sorry. But, there are some astronomers I'm working
with, who can take part in this. Sergey Karpov has done extensive benchmarks
of q3c, rtree and pgsphere and found the latter still has some benefits
in some workload, so we are interesting in development.


Regards,
Oleg


So where do we go from here ?


Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca





On Fri, 6 Jan 2012, Andrew Dunstan wrote:




On 01/06/2012 12:32 PM, Dave Cramer wrote:


I've been asked by someone to support pgshpere.

It would appear that the two project owners are MIA. If anyone knows
different can they let me know ?

Does anyone have any objection to me taking over the project?




One of the owners is Teodor, who is a core committer ... I hope he's not
MIA.

cheers

andrew






       Regards,
               Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sending notifications from the master to the standby

2012-01-10 Thread Simon Riggs
On Tue, Jan 10, 2012 at 12:56 PM, Joachim Wieland j...@mcknight.de wrote:

 I chose to do it this way because it seemed like the most natural way
 to do it (which of course doesn't mean it's the best)  :-).

If its any consolation its exactly how I would have done it also up
until about 2 months ago, and I remember discussing almost exactly the
design you presented with someone in Rome last year.

Anyway its a good feature, so I hope you have time.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sending notifications from the master to the standby

2012-01-10 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Tue, Jan 10, 2012 at 5:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It might be a bit tricky to get walreceivers to inject
 the data into the slave-side ring buffer at the right time, ie, not
 until after the commit a given message describes has been replayed;
 but I don't immediately see a reason to think that's infeasible.

 [ Simon sketches a design for that ]

Seems a bit overcomplicated.  I was just thinking of having walreceiver
note the WAL endpoint at the instant of receipt of a notify message,
and not release the notify message to the slave ring buffer until WAL
replay has advanced that far.  You'd need to lay down ground rules about
how the walsender times the insertion of notify messages relative to
WAL in its output.  But I don't see the need for either explicit markers
in the WAL stream or a hash table.  Indeed, a hash table scares me
because it doesn't clearly guarantee that notifies will be released in
arrival order.

 Suggest we add something to initial handshake from standby to say
 please send me notify traffic,

+1 on that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq: PQcmdStatus, PQcmdTuples signatures can be painlessly improved

2012-01-10 Thread Peter Eisentraut
On tis, 2011-12-13 at 07:55 -0500, Alex Goncharov wrote:
   char *PQcmdStatus(PGresult *res);
   char *PQcmdTuples(PGresult *res);
 
 Unreasonable:
 
   a. What, these two can modify 'res' I pass in?..
 
   b. Oh, yes, because they return 'char *' pointing to
  'res-cmdStatus+n', so, a libpq user may write:
 
 char *s = PQcmdStatus(res);
 *s = 'x';
 
  and have 'res' modified.  (Would be the user's fault, of course.)
 
Note that const PGresult * would only warn against changing the fields
of the PGresult struct.  It doesn't do anything about changing the data
pointed to by pointers in the PGresult struct.  So what you are saying
doesn't follow.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-10 Thread Andrew Dunstan



On 01/03/2012 09:11 PM, Andrew Dunstan wrote:



On 01/03/2012 08:40 PM, Robert Haas wrote:
On Tue, Jan 3, 2012 at 9:33 AM, Pavel 
Stehulepavel.steh...@gmail.com  wrote:
I'd suppose it interesting to add a table to pg_catalog containing 
this data.

- it is useless overhead

I tend to agree.


I am expecting so definition some constants in Perl, Python is simple

Presumably one could instead write a script to transform the list of
constants into a .pm file that could be loaded into the background, or
whatever PL/python's equivalent of that concept is.  Not sure if
there's a better way to do it.


Yeah, I'm with you and Pavel. Here's my quick perl one-liner to 
produce a set of SPI_* constants for pl/perl. I'm looking at the best 
way to include this in the bootstrap code.


   perl -ne 'BEGIN { print use constant\n{\n; } END { print };\n; }
   print \t$1 = $2,\n if /#define (SPI_\S+)\s+\(?(-?\d+)\)?/;'
   src/include/executor/spi.h






Actually, now I look closer I see that PLPerl passes back a stringified 
status from SPI_execute(), so there is no great need for setting up 
these constants. It's probably water under the bridge now, but maybe 
PLPython should have done this too.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-10 Thread Pavel Stehule
2012/1/10 Andrew Dunstan and...@dunslane.net:


 On 01/03/2012 09:11 PM, Andrew Dunstan wrote:



 On 01/03/2012 08:40 PM, Robert Haas wrote:

 On Tue, Jan 3, 2012 at 9:33 AM, Pavel Stehulepavel.steh...@gmail.com
  wrote:

 I'd suppose it interesting to add a table to pg_catalog containing this
 data.

 - it is useless overhead

 I tend to agree.

 I am expecting so definition some constants in Perl, Python is simple

 Presumably one could instead write a script to transform the list of
 constants into a .pm file that could be loaded into the background, or
 whatever PL/python's equivalent of that concept is.  Not sure if
 there's a better way to do it.


 Yeah, I'm with you and Pavel. Here's my quick perl one-liner to produce a
 set of SPI_* constants for pl/perl. I'm looking at the best way to include
 this in the bootstrap code.

   perl -ne 'BEGIN { print use constant\n{\n; } END { print };\n; }
   print \t$1 = $2,\n if /#define (SPI_\S+)\s+\(?(-?\d+)\)?/;'
   src/include/executor/spi.h





 Actually, now I look closer I see that PLPerl passes back a stringified
 status from SPI_execute(), so there is no great need for setting up these
 constants. It's probably water under the bridge now, but maybe PLPython
 should have done this too.


This is not documented well - I see nothing about result value in doc.
Does it raise exception when SPI returns some bad result value?

Regards

Pavel

 cheers

 andrew

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sending notifications from the master to the standby

2012-01-10 Thread Simon Riggs
On Tue, Jan 10, 2012 at 4:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On Tue, Jan 10, 2012 at 5:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It might be a bit tricky to get walreceivers to inject
 the data into the slave-side ring buffer at the right time, ie, not
 until after the commit a given message describes has been replayed;
 but I don't immediately see a reason to think that's infeasible.

 [ Simon sketches a design for that ]

 Seems a bit overcomplicated.  I was just thinking of having walreceiver
 note the WAL endpoint at the instant of receipt of a notify message,
 and not release the notify message to the slave ring buffer until WAL
 replay has advanced that far.  You'd need to lay down ground rules about
 how the walsender times the insertion of notify messages relative to
 WAL in its output.

You have to store the messages somewhere until they're needed. If that
somewhere isn't on the standby, very close to the Startup process then
its going to be very slow. Putting a marker in the WAL stream
guarantees arrival order. The hash table was just a place to store
them until they're needed, could be a ring buffer as well.

Inserts into the slave ring buffer already have an xid on them, so the
test will probably already cope with messages inserted but for which
the parent xid has not committed. The only problem is coping with
possible out of sequence messages.

 But I don't see the need for either explicit markers
 in the WAL stream or a hash table.  Indeed, a hash table scares me
 because it doesn't clearly guarantee that notifies will be released in
 arrival order.

The hash table is clearly not the thing providing an arrival order
guarantee, it was just a cache.

You have a few choices: (1) you either send the message while holding
an exclusive lock, or (2) you send them as they come and buffer them,
then reorder them using the WAL log sequence since that matches the
original commit sequence. Or (3) add a sequence number to the messages
sent by WALSender, so that the WALReceiver can buffer them locally and
insert them in the correct order into the normal ring buffer - so in
(3) the message sequence and the WAL sequence match, but the mechanism
is different.

(1) is out because the purpose of offloading to the standby is to give
the master more capcity. If we slow it down in order to serve the
standby we're doing things the wrong way around.

I was choosing (2), maybe you prefer (3) or another design entirely.
They look very similar to me and about the same complexity, its just
copying data and preserving sequence.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsphere

2012-01-10 Thread Jan-Benedict Glaw
On Sun, 2012-01-08 22:19:53 +0400, Oleg Bartunov o...@sai.msu.su wrote:
 pgsphere now could benefit very much from our KNNGiST feature. This is
 number one development from my point of view. I and Teodor have no
 time to work on pgsphere, sorry. But, there are some astronomers I'm working
 with, who can take part in this. Sergey Karpov has done extensive benchmarks
 of q3c, rtree and pgsphere and found the latter still has some benefits
 in some workload, so we are interesting in development.

Could the PostGIS stuff be abused for stellar coordinates?

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:  Träume nicht von Deinem Leben: Lebe Deinen Traum!
the second  :


signature.asc
Description: Digital signature


Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-10 Thread Andrew Dunstan



On 01/10/2012 12:34 PM, Pavel Stehule wrote:


Actually, now I look closer I see that PLPerl passes back a stringified
status from SPI_execute(), so there is no great need for setting up these
constants. It's probably water under the bridge now, but maybe PLPython
should have done this too.


This is not documented well - I see nothing about result value in doc.
Does it raise exception when SPI returns some bad result value?



The docs state:

   You can then access the command status (e.g., SPI_OK_INSERT) like this:

   $res = $rv-{status};


And it works like this:

   andrew=# do 'my $rv = spi_exec_query(select 1 as a);
   elog(NOTICE,$rv-{status});' language plperl;
   NOTICE:  SPI_OK_SELECT
   CONTEXT:  PL/Perl anonymous code block
   DO
   andrew=#

An error causes the function to end, so it never sees the error status:

   andrew=# do 'my $rv = spi_exec_query(select blurfl);
   elog(NOTICE,$rv-{status});' language plperl;
   ERROR:  column blurfl does not exist at line 1.
   CONTEXT:  PL/Perl anonymous code block
   andrew=#


If you think more documentation is needed, submit a patch.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-10 Thread Pavel Stehule
2012/1/10 Andrew Dunstan and...@dunslane.net:


 On 01/10/2012 12:34 PM, Pavel Stehule wrote:


 Actually, now I look closer I see that PLPerl passes back a stringified
 status from SPI_execute(), so there is no great need for setting up these
 constants. It's probably water under the bridge now, but maybe PLPython
 should have done this too.

 This is not documented well - I see nothing about result value in doc.
 Does it raise exception when SPI returns some bad result value?



 The docs state:

   You can then access the command status (e.g., SPI_OK_INSERT) like this:

   $res = $rv-{status};


 And it works like this:

   andrew=# do 'my $rv = spi_exec_query(select 1 as a);
   elog(NOTICE,$rv-{status});' language plperl;
   NOTICE:  SPI_OK_SELECT
   CONTEXT:  PL/Perl anonymous code block
   DO
   andrew=#

 An error causes the function to end, so it never sees the error status:

   andrew=# do 'my $rv = spi_exec_query(select blurfl);
   elog(NOTICE,$rv-{status});' language plperl;
   ERROR:  column blurfl does not exist at line 1.
   CONTEXT:  PL/Perl anonymous code block
   andrew=#


 If you think more documentation is needed, submit a patch.

I was blind, I am sorry - I am missing explicit note, so command
status is string, that is all.

Regards

Pavel



 cheers

 andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq: PQcmdStatus, PQcmdTuples signatures can be painlessly improved

2012-01-10 Thread Alex Goncharov
,--- You/Peter (Tue, 10 Jan 2012 19:13:42 +0200) *
| On tis, 2011-12-13 at 07:55 -0500, Alex Goncharov wrote:
|char *PQcmdStatus(PGresult *res);
|char *PQcmdTuples(PGresult *res);
|  
|  Unreasonable:
|  
|a. What, these two can modify 'res' I pass in?..
|  
|b. Oh, yes, because they return 'char *' pointing to
|   'res-cmdStatus+n', so, a libpq user may write:
|  
|  char *s = PQcmdStatus(res);
|  *s = 'x';
|  
|   and have 'res' modified.  (Would be the user's fault, of course.)
|  
| Note that const PGresult * would only warn against changing the
| fields

It would not warn, it would err (the compilation should fail).

| of the PGresult struct.  It doesn't do anything about changing the data
| pointed to by pointers in the PGresult struct.  So what you are saying
| doesn't follow.

By this logic, passing 'const struct foo *' doesn't have any point and
value, for any function.  But we know that this is done (and thank you
for that) in many cases -- a good style, self-documentation and some
protection.

E.g. here:

,--- I/Alex (Tue, 13 Dec 2011 07:55:45 -0500) *
| Compare:
| 
|   int PQntuples(const PGresult *res)
| 
| Reasonable: doesn't modify 'res'.
`-*

BTW, I have not submitted the context differences, as suggested, only
because of extreme overload at work and the need to do a careful
caller and documentation analysis. I still hope to be able to do it in
a reasonably near future.

-- Alex -- alex-goncha...@comcast.net --

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-10 Thread Greg Smith

On 1/5/12 5:04 AM, Benedikt Grundmann wrote:

I have a question of how to benchmark hardware to determine
the appropriate ratio of seq_page_cost vs random_page_cost.

Emails in this mailing lists archive seem to indicate that
1.0 vs 3.0 - 4.0 are appropriate values on modern hardware.

Which surprised me a bit as I had thought that on actual
harddrives (ignoring SSDs) random_page_cost is higher.
I guess that the number tries to reflect caching of the
relevant pages in memory and modern hardware you have
more of that?


That sort of thing is one reason why all attempts so far to set 
random_page_cost based on physical characteristics haven't gone anywhere 
useful.  The setting is sort of overloaded right now, it's a fuzzy mix 
of true random seek cost blended with some notion of cache percentage. 
Trying to bring some measurements to bear on it is a less effective 
approach than what people actually do here.  Monitor the profile of 
query execution, change the value, see what happens.  Use that as 
feedback for what direction to keep going; repeat until you're just 
spinning with no improvements.


It's easy to measure the actual read times and set the value based on 
that instead.  But that doesn't actually work out so well.  There's at 
least three problems in that area:


-Timing information is sometimes very expensive to collect.  This I 
expect to at least document and quantify why usefully as a 9.2 feature.


-Basing query execution decisions on what is already in the cache leads 
to all sorts of nasty feedback situations where you optimize for the 
short term, for example using an index already in cache, while never 
reading in what would be a superior long term choice because it seems 
too expensive.


-Making a major adjustment to the query planning model like this would 
require a large performance regression testing framework to evaluate the 
results in.



We are not sure if the database used to choose differently
before the move to the new hardware and the hardware is
performing worse for random seeks.  Or if the planner is
now making different choices.


I don't recommend ever deploying new hardware without first doing some 
low-level benchmarks to validate its performance.  Once stuff goes into 
production, you can't do that anymore.  See 
http://www.2ndquadrant.com/en/talks/ for my hardware benchmarking talks 
if you'd like some ideas on what to collect.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsphere

2012-01-10 Thread Oleg Bartunov

On Tue, 10 Jan 2012, Jan-Benedict Glaw wrote:


On Sun, 2012-01-08 22:19:53 +0400, Oleg Bartunov o...@sai.msu.su wrote:

pgsphere now could benefit very much from our KNNGiST feature. This is
number one development from my point of view. I and Teodor have no
time to work on pgsphere, sorry. But, there are some astronomers I'm working
with, who can take part in this. Sergey Karpov has done extensive benchmarks
of q3c, rtree and pgsphere and found the latter still has some benefits
in some workload, so we are interesting in development.


Could the PostGIS stuff be abused for stellar coordinates?


There is no principal difference between celestial sphere and earth,
it's a matter of conversion between coordinates.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-01-10 Thread Dimitri Fontaine
Hi,

Robert Haas robertmh...@gmail.com writes:
 Maybe we should try to split the baby here and defer the question of
 whether to expose any of the parse tree internals, and if so how much,
 to a future release.  It seems to me that we could design a fairly
 useful set of functionality around AFTER-CREATE, BEFORE-DROP, and
 maybe even AFTER-ALTER triggers without exposing any parse tree
 details.

+1

Also remember that you have a “normalized” command string to play with.
Lots of use cases are already ok here. The other ones would need a tree
representation that's easy to consume, which in the current state of
affairs (I saw no progress on the JSON data type and facilities) is very
hard to imagine when you consider PLpgSQL.

So unless I hear about a show stopper, I'm going to work some more on
the command trigger patch where I still had some rough edges to polish.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsphere

2012-01-10 Thread Greg Smith

On 1/10/12 9:54 AM, Andrew Dunstan wrote:

Speaking with my pgfoundry admin hat on, I am extremely reluctant to
take control of a project away from its owners. If you don't get any
action in a week or so, you can approach the pgfoundry admins for help.
In the meantime you might want to fork the code onto github or bitbucket.


Man, the pgfoundry admin hat has to be one of the least collectible ones 
around.  Not exactly a lot of demand for them, and everyone who has one 
would happily give theirs away.


I don't want to drag this off-topic thread on, but it's worth mentioning 
that http://wiki.postgresql.org/wiki/Project_Hosting has started a small 
migration guide of sorts for where else you might host this sort of 
project at.  It's not necessarily obvious what pgfoundry provides 
relative to other sites, or what the trade-offs in the other 
possibilities are.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsphere

2012-01-10 Thread Jan-Benedict Glaw
On Wed, 2012-01-11 00:43:25 +0400, Oleg Bartunov o...@sai.msu.su wrote:
 On Tue, 10 Jan 2012, Jan-Benedict Glaw wrote:
  On Sun, 2012-01-08 22:19:53 +0400, Oleg Bartunov o...@sai.msu.su wrote:
   pgsphere now could benefit very much from our KNNGiST feature.
   This is number one development from my point of view. I and
   Teodor have no time to work on pgsphere, sorry. But, there are
   some astronomers I'm working with, who can take part in this.
   Sergey Karpov has done extensive benchmarks of q3c, rtree and
   pgsphere and found the latter still has some benefits in some
   workload, so we are interesting in development.
  Could the PostGIS stuff be abused for stellar coordinates?
 
 There is no principal difference between celestial sphere and earth,
 it's a matter of conversion between coordinates.

I'm a hobby astronomer myself--so I'm asking myself what is actually
needed. I had a look at pgsphere some weeks ago, but didn't use it,
because it seemed to be somewhat dead.

My next approach was to load a dataset as-is (with floting point RA
and dec) and then created a new table selecting all data and converted
the coordinates to POINTs. Even with a combined index on (RA,Dec) (as
well as one on the POINT column of the new PostGIS enabled table), the
later was quite faster when searching for specific areas etc.

One important thing that's needed is transformation between
equatorial, ecliptic, galactic and probably (local, incorporating
current local longitude/latitude and time) horizontal
coordinates.  What might be important, too, is to be able to change
between J2000.0 and B1950.0 etc.

The probably easiest thing is to change the (printable) representation
of coordinates, because (depending on the people you talk to),
especially for RA (equivalent to the earth's longitude), there are at
least two totally different notations used:

2h 30min 4.3sec = 2.501194h  =  37.5179°

The other axis is usually simply written in degree. 

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
  Signature of:   Wenn ich wach bin, träume ich.
  the second  :


signature.asc
Description: Digital signature


Re: [HACKERS] Sending notifications from the master to the standby

2012-01-10 Thread Joachim Wieland
On Tue, Jan 10, 2012 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 [ Tom sketches a design ]
 Seems a bit overcomplicated.  I was just thinking of having walreceiver
 note the WAL endpoint at the instant of receipt of a notify message,
 and not release the notify message to the slave ring buffer until WAL
 replay has advanced that far.

How about this: We mark a notify message specially if it is the last
message sent by a transaction and also add a flag to
commit/abort-records, indicating whether or not the transaction has
sent notifys. Now if such a last message is being put into the regular
ring buffer on the standby and the xid is known to have committed or
aborted, signal the backends. Also signal from a commit/abort-record
if the flag is set.

If the notify messages make it to the standby first, we just put
messages of a not-yet-committed transaction into the queue, just as on
the master. Listeners will get signaled when the commit record
arrives. If the commit record arrives first, we signal, but the
listeners won't find anything (at least not the latest notifications).
When the last notify of that transaction finally arrives, the
transaction is known to have committed and the listeners will get
signaled.

What could still happen is that the standby receives notifys, the
commit message and more notifys. Listeners would still eventually get
all the messages but potentially not all of them at once. Is this a
problem? If so, then we could add a special stop reading-record into
the queue before we write the notifys, that we subsequently change
into a continue reading-record once all notifications are in the
queue. Readers would treat a stop reading record just like a
not-yet-committed transaction and ignore a continue reading record.


 Suggest we add something to initial handshake from standby to say
 please send me notify traffic,

 +1 on that.

From what you said I imagined this walsender listener as a regular
listener that listens on the union of all sets of channels that
anybody is listening on on the standby, with the LISTEN transaction on
the standby return from commit once the listener is known to have been
set up on the master.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log messages for archive recovery progress

2012-01-10 Thread Euler Taveira de Oliveira
On 08-01-2012 11:59, Satoshi Nagayasu / Uptime Technologies, LLC. wrote:
 [2011-12-08 15:14:36 JST] 16758: LOG:  restored log file 
 00080046 from archive
 [2011-12-08 15:14:36 JST] 16758: LOG:  recoverying 00080046
 [2011-12-08 15:14:36 JST] 16758: LOG:  restored log file 
 00080047 from archive
 [2011-12-08 15:14:36 JST] 16758: LOG:  recoverying 00080047
 cp: cannot stat `/backups/archlog/00080048': No such file or 
 directory
 [2011-12-08 15:14:37 JST] 16758: LOG:  could not restore file 
 00080048 from archive
 [2011-12-08 15:14:37 JST] 16758: LOG:  attempting to look into pg_xlog
 [2011-12-08 15:14:37 JST] 16758: LOG:  recoverying 00080048
 
What about just 'restored log file 00080048 from pg_xlog'
instead of the last two messages? If you can't read from pg_xlog emit 'could
not restore file 00080048 from pg_xlog'.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-10 Thread Robert Haas
On Tue, Dec 20, 2011 at 9:06 PM, David E. Wheeler da...@kineticode.com wrote:
 On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote:
 Are people explicitly asking for a) *JSON* datatype or b) a type that lets 
 you store arbitrary complex semi-untyped data structures?

 Yes.

 if b) then this might get a lot more interesting

 JSON is the most popular/likely way to represent that, I think.

On that note, here's an updated version of the patch I posted
upthread, with some regression tests and minimal documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


json-rmh-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] checkpoint writeback via sync_file_range

2012-01-10 Thread Robert Haas
Greg Smith muttered a while ago about wanting to do something with
sync_file_range to improve checkpoint behavior on Linux.  I thought he
was talking about trying to sync only the range of blocks known to be
dirty, which didn't seem like a very exciting idea, but after looking
at the man page for sync_file_range, I think I understand what he was
really going for: sync_file_range allows you to hint the Linux kernel
that you'd like it to clean a certain set of pages.  I further recall
from Greg's previous comments that in the scenarios he's seen,
checkpoint I/O spikes are caused not so much by the data written out
by the checkpoint itself but from the other dirty data in the kernel
buffer cache.  Based on that, I whipped up the attached patch, which,
if sync_file_range is available, simply iterates through everything
that will eventually be fsync'd before beginning the write phase and
tells the Linux kernel to put them all under write-out.

I don't know that I have a suitable place to test this, and I'm not
quite sure what a good test setup would look like either, so while
I've tested that this appears to issue the right kernel calls, I am
not sure whether it actually fixes the problem case.  But here's the
patch, anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


writeback-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Real-life range datasets

2012-01-10 Thread Stefan Keller
Hi

I'm proposing OpenStreetMap which is of variable size up to 250 GB
XML Data for whole world.
It's downloadable from CloudMade.com or Geofabrik.de and can be
imported into PostgreSQL using osm2pgsql.
It's a key/value schema literally of the real world. I'm using hstore
option of osm2pgsql and hstore index is based on GIST.

I'm running a database instance called PostGIS Terminal which is a
daily extract of Switzerland:
http://labs.geometa.info/postgisterminal/?xapi=node%5Bname%3DHochschule%20Rapperswil%5D

This is a typical query which extracts 'real' highways (being of
geometry linestring, aka line/way) with a speed limit =  100 km/h:

SELECT ST_AsText(way) geom
FROM osm_line
WHERE tags @ 'highway=motorway'
AND coalesce(substring((tags-'maxspeed') FROM E'[0-9]+')::int,0) = 100

Yours, Stefan

2011/12/23 Alexander Korotkov aekorot...@gmail.com:
 Hello,

 On Thu, Dec 22, 2011 at 12:51 PM, Benedikt Grundmann
 bgrundm...@janestreet.com wrote:

 I should be able to give you a table with the same characteristics as
 the instruments table but bogus data by replacing all entries in the
 table with random strings of the same length or something like that.
 I can probably take a little bit of time during this or the next week
 to generate such fake real world data ;-)   Is there an ftp site to
 upload the gzipped pg_dump file to?


 Thank you very much for your response! I'm going to send you accessories for
 upload soon.

 -
 With best regards,
 Alexander Korotkov.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sending notifications from the master to the standby

2012-01-10 Thread Tom Lane
Joachim Wieland j...@mcknight.de writes:
 On Tue, Jan 10, 2012 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 Suggest we add something to initial handshake from standby to say
 please send me notify traffic,

 +1 on that.

 From what you said I imagined this walsender listener as a regular
 listener that listens on the union of all sets of channels that
 anybody is listening on on the standby, with the LISTEN transaction on
 the standby return from commit once the listener is known to have been
 set up on the master.

This seems vastly overcomplicated too.  I'd just vote for a simple
yes/no flag, so that receivers that have no interest in notifies don't
have to deal with them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sending notifications from the master to the standby

2012-01-10 Thread Tom Lane
BTW ... it occurs to me to ask whether we really have a solid use-case
for having listeners attached to slave servers.  I have personally never
seen an application for LISTEN/NOTIFY in which the listeners were
entirely read-only.  Even if there are one or two cases out there, it's
not clear to me that supporting it is worth the extra complexity that
seems to be needed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-10 Thread Greg Smith

On 1/10/12 9:14 PM, Robert Haas wrote:

Based on that, I whipped up the attached patch, which,
if sync_file_range is available, simply iterates through everything
that will eventually be fsync'd before beginning the write phase and
tells the Linux kernel to put them all under write-out.


I hadn't really thought of using it that way.  The kernel expects that 
when this is called the normal way, you're going to track exactly which 
segments you want it to sync.  And that data isn't really passed through 
the fsync absorption code yet; the list of things to fsync has already 
lost that level of detail.


What you're doing here doesn't care though, and I hadn't considered that 
SYNC_FILE_RANGE_WRITE could be used that way on my last pass through its 
docs.  Used this way, it's basically fsync without the wait or 
guarantee; it just tries to push what's already dirty further ahead of 
the write queue than those writes would otherwise be.


One idea I was thinking about here was building a little hash table 
inside of the fsync absorb code, tracking how many absorb operations 
have happened for whatever the most popular relation files are.  The 
idea is that we might say use sync_file_range every time N calls for 
a relation have come in, just to keep from ever accumulating too many 
writes to any one file before trying to nudge some of it out of there. 
The bat that keeps hitting me in the head here is that right now, a 
single fsync might have a full 1GB of writes to flush out, perhaps 
because it extended a table and then write more than that to it.  And in 
everything but a SSD or giant SAN cache situation, 1GB of I/O is just 
too much to fsync at a time without the OS choking a little on it.



I don't know that I have a suitable place to test this, and I'm not
quite sure what a good test setup would look like either, so while
I've tested that this appears to issue the right kernel calls, I am
not sure whether it actually fixes the problem case.


I'll put this into my testing queue after the upcoming CF starts.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-10 Thread Pavel Stehule
2012/1/11 Robert Haas robertmh...@gmail.com:
 On Tue, Dec 20, 2011 at 9:06 PM, David E. Wheeler da...@kineticode.com 
 wrote:
 On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote:
 Are people explicitly asking for a) *JSON* datatype or b) a type that lets 
 you store arbitrary complex semi-untyped data structures?

 Yes.

 if b) then this might get a lot more interesting

 JSON is the most popular/likely way to represent that, I think.

 On that note, here's an updated version of the patch I posted
 upthread, with some regression tests and minimal documentation.

I like this patch and this feature.

I see only one issue - there is not functionality that helps generate
JSON in pg.

What do you think about functions: array_to_json(anyarray),
row_to_json(any) and format_json(text, text, ...)

Regards

Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers