date:20120130

Re: [HACKERS] Group commit, revised

2012-01-30 Thread Heikki Linnakangas


On 31.01.2012 01:35, Simon Riggs wrote:

The plan here is to allow WAL flush and clog updates to occur
concurrently. Which allows the clog contention and update time to be
completely hidden behind the wait for the WAL flush. That is only
possible if we have the WALwriter involved since we need two processes
to be actively involved.

...

The theory behind this is clear, but needs some explanation.

There are 5 actions that need to occur at commit
1) insert WAL record
2) optionally flush WAL record
3) mark the clog AND set LSN from (1) if we skipped (2)
4) optionally wait for sync rep
5) remove the proc from the procarray


> ...


Notice that step (2) and step (3) are actually independent of each other.

So an improved design for commit is to
2) request flush up to LSN, but don't wait
3) mark the clog and set LSN
4) wait for LSN once, either for walwriter or walsender to release us


That seems like a pretty marginal gain. If you're bound by the speed of 
fsyncs, this will reduce the latency by the time it takes to mark the 
clog, which is tiny in comparison to all the other stuff that needs to 
happen, like, flushing the WAL. And that's ignoring any additional 
overhead caused by the signaling between processes. If you're bound by 
CPU capacity, this doesn't help at all because it just moves the work 
around.


Anyway, this is quite different from the original goal and patch for 
group commit, so can we please leave this for 9.3, and move on with the 
review of pending 9.2 patches.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-30 Thread Marko Kreen

On Tue, Jan 31, 2012 at 4:59 AM, Kyotaro HORIGUCHI
 wrote:
> This is fixed version of dblink.c for row processor.
>
>> i'll re-send the properly fixed patch for dblink.c later.
>
> - malloc error in initStoreInfo throws ERRCODE_OUT_OF_MEMORY. (new error)
>
> - storeHandler() now returns FALSE on malloc failure. Garbage
>  cleanup is done in dblink_fetch() or dblink_record_internal().
>  The behavior that this dblink displays this error as 'unkown
>  error/could not execute query' on the user session is as it did
>  before.

Another thing: if realloc() fails, the old pointer stays valid.
So it needs to be assigned to temp variable first, before
overwriting old pointer.

And seems malloc() is preferable to palloc() to avoid
exceptions inside row processor.  Although latter
one could be made to work, it might be unnecessary
complexity.  (store current pqresult into remoteConn?)

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for parallel pg_dump

2012-01-30 Thread Joachim Wieland

On Mon, Jan 30, 2012 at 12:20 PM, Robert Haas  wrote:
> But the immediate problem is that pg_dump.c is heavily reliant on
> global variables, which isn't going to fly if we want this code to use
> threads on Windows (or anywhere else).  It's also bad style.

Technically, since most of pg_dump.c dumps the catalog and since this
isn't done in parallel but only in the master process, most functions
need not be changed for the parallel restore. Only those that are
called from the worker threads need to be changed, this has been done
in e.g. dumpBlobs(), the change that you quoted upthread.

> But it
> seems possible that we might someday want to dump from one database
> and restore into another database at the same time, so maybe we ought
> to play it safe and use different variables for those things.

Actually I've tried that but in the end concluded that it's best to
have at most one database connection in an ArchiveHandle if you don't
want to do a lot more refactoring. Besides the normal connection
parameters like host, port, ... there's also std_strings, encoding,
savedPassword, currUser/currSchema, lo_buf, remoteVersion ... that
wouldn't be obvious where they belonged to.

Speaking about refactoring, I'm happy to also throw in the idea to
make the dump and restore more symmetrical than they are now. I kinda
disliked RestoreOptions* being a member of the ArchiveHandle without
having something similar for the dump. Ideally I'd say there should be
a DumpOptions and a RestoreOptions field (with a "struct Connection"
being part of them containing all the different connection
parameters). They could be a union if you wanted to allow only one
connection, or not if you want more than one.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?

2012-01-30 Thread Tom Lane

Josh Berkus  writes:
> On 1/30/12 5:41 PM, Tom Lane wrote:
>> Well, the current marking is clearly incorrect.  What to do about that
>> is a bit less clear --- should we downgrade the marking, or change the
>> function's behavior so that it really is immutable?

> AFAIK, the only case which is NOT immutable is extract(epoch FROM
> timestamp without time zone), no?

That's the only one we currently know is not immutable.  But before we
make any decisions, I think it'd be a good idea to scrutinize all the
other cases too, because obviously this area has gotten some careless
hacking (*) done on it in the past.

regards, tom lane

(*) I have a nasty feeling that the carelessness was mine.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-30 Thread Tom Lane

[ Note: please follow-up to pgsql-hackers not pgsql-general; I think
  this discussion needs to move there ]

hubert depesz lubaczewski  writes:
> On Mon, Jan 30, 2012 at 11:30:51AM -0500, Tom Lane wrote:
>> That is way too vague for my taste, as you have not shown the pg_dump
>> options you're using, for example.

> i tried to explain that the options don't matter, but here we go. full
> example:
> [ example showing pg_dump's odd behavior for extension config tables ]

[ traces through that with gdb... ]

As I suspected, the behavioral change from 9.1 to HEAD is not
intentional.  It is an artifact of commit
7b070e896ca835318c90b02c830a5c4844413b64, which is almost, but not
quite, entirely broken.  I won't enumerate its shortcomings here,
because they're not really relevant, but it does seem appropriate to
discuss exactly what we think *should* happen for tables created inside
extensions.

The original design intention for non-config tables was, per the manual:

Ordinarily, if a table is part of an extension, neither the
table's definition nor its content will be dumped by pg_dump.

the assumption being that both the definition and the content would be
re-loaded by executing the extension's SQL script.  The purpose of
pg_extension_config_dump() is to allow part or all of the data in the
table to be treated as user data and thus dumped; this is assumed to
be data not supplied by the extension script but by subsequent user
insertions.

I don't recall that we thought very hard about what should happen when
pg_dump switches are used to produce a selective dump, but ISTM
reasonable that if it's "user data" then it should be dumped only if
data in a regular user table would be.  So I agree it's pretty broken
that "pg_dump -t foo" will dump data belonging to a config table not
selected by the -t switch.  I think this should be changed in both HEAD
and 9.1 (note that HEAD will presumably return to the 9.1 behavior once
that --exclude-table-data patch gets fixed).

What's not apparent to me is whether there's an argument for doing more
than that.  It strikes me that the current design is not very friendly
towards the idea of an extension that creates a table that's meant
solely to hold user data --- you'd have to mark it as "config" which
seems a bit unfortunate terminology for that case.  Is it important to
do something about that, and if so what?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Confusing EXPLAIN output in case of inherited tables

2012-01-30 Thread Ashutosh Bapat

I don't believe that the problem has anything to do with the names of

> other tables that might happen to exist in the database.  It's a matter
> of what RTE names/aliases are exposed for variable references in
> different parts of the query.
>
>
Names of other tables come into picture when we schema qualify the fake
aliases in the generated query. See examples in first post.


>regards, tom lane
>



-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?

2012-01-30 Thread Josh Berkus

On 1/30/12 5:41 PM, Tom Lane wrote:
> hubert depesz lubaczewski  writes:
>> On Mon, Jan 30, 2012 at 10:35:21AM -0800, Josh Berkus wrote:
>>> We can't have functions which are immutable or not depending on their
>>> inputs.  That way lies madness.
> 
>> but this is exactly what's happening now.
> 
> Well, the current marking is clearly incorrect.  What to do about that
> is a bit less clear --- should we downgrade the marking, or change the
> function's behavior so that it really is immutable?

AFAIK, the only case which is NOT immutable is extract(epoch FROM
timestamp without time zone), no?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-30 Thread Kyotaro HORIGUCHI

This is fixed version of dblink.c for row processor.

> i'll re-send the properly fixed patch for dblink.c later.

- malloc error in initStoreInfo throws ERRCODE_OUT_OF_MEMORY. (new error)

- storeHandler() now returns FALSE on malloc failure. Garbage
  cleanup is done in dblink_fetch() or dblink_record_internal().
  The behavior that this dblink displays this error as 'unkown
  error/could not execute query' on the user session is as it did
  before.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..7a82ea1 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	bool error_occurred;
+	bool nummismatch;
+	ErrorData *edata;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
    int2vector *pkattnums_arg, int32 pknumatts_arg,
    int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,36 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
 	res = PQexec(conn, buf.data);
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+	(errcode(ERRCODE_DATATYPE_MISMATCH),
+	 errmsg("remote query result rowtype does not match "
+			"the specified FROM clause rowtype")));
+		}
+		else if (storeinfo.edata)
+			ReThrowError(storeinfo.edata);
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +617,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_INVALID_CURSOR_NAME),
  errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +678,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +754,217 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
 	if (!is_async)
 		res = PQexec(conn, sql);
 	else
-	{
 		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
-	}
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+	finishStoreInfo(&storeinfo);
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreI

Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?

2012-01-30 Thread Tom Lane

hubert depesz lubaczewski  writes:
> On Mon, Jan 30, 2012 at 10:35:21AM -0800, Josh Berkus wrote:
>> We can't have functions which are immutable or not depending on their
>> inputs.  That way lies madness.

> but this is exactly what's happening now.

Well, the current marking is clearly incorrect.  What to do about that
is a bit less clear --- should we downgrade the marking, or change the
function's behavior so that it really is immutable?

I haven't formed an opinion on that myself, other than to think that
it's something that requires more than a moment's thought.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for parallel pg_dump

2012-01-30 Thread Tom Lane

Robert Haas  writes:
> But the immediate problem is that pg_dump.c is heavily reliant on
> global variables, which isn't going to fly if we want this code to use
> threads on Windows (or anywhere else).  It's also bad style.  So I
> suggest that we refactor pg_dump.c to get rid of g_conn and g_fout.

I've looked at that a bit in the past and decided that the notational
overhead would be too much.  OTOH, if we're going to be forced into it
in order to support parallel pg_dump, we might as well do it first in a
separate patch.

> ...  But it
> seems possible that we might someday want to dump from one database
> and restore into another database at the same time, so maybe we ought
> to play it safe and use different variables for those things.  So I'm
> inclined to think we could just add a "PGconn *remote_connection"
> argument to the Archive structure (to go with all of the similarly
> named "remote" fields, all of which describe the DB to be dumped) and
> then that would be available everywhere that the Archive itself is.

I always thought that the "remote" terminology was singularly unhelpful.
It implies there's a "local" connection floating around somewhere, which
of course there is not, and it does nothing to remind you of whether the
connection leads to a database being dumped or a database being restored
into.  If we are going to have two fields, could we name them something
less opaque, perhaps "src_connection" and "dest_connection"?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-30 Thread Noah Misch

On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote:
> The biggest item remaining is the point you raised about multixactid
> wraparound.  This is closely related to multixact truncation and the way
> checkpoints are to be handled.  If we think that MultiXactId wraparound
> is possible, and we need to involve autovacuum to keep it at bay, then I

To prove it possible, we need prove there exists some sequence of operations
consuming N xids and M > N multixactids.  Have N transactions key-lock N-1
rows apiece, then have each of them key-lock one of the rows locked by each
other transaction.  This consumes N xids and N(N-1) multixactids.  I believe
you could construct a workload with N! multixact usage, too.

Existence proofs are one thing, real workloads another.  My unsubstantiated
guess is that multixactid use will not overtake xid use in bulk on a workload
not specifically tailored to do so.  So, I think it's enough to notice it,
refuse to assign a new multixactid, and tell the user to clear the condition
with a VACUUM FREEZE of all databases.  Other opinions would indeed be useful.

> think the only way to make that work is to add another column to
> pg_class so that each table's oldest multixact is tracked, same as we do
> with relfrozenxid for Xids.  If we do that, I think we can do away with
> most of the MultiXactTruncate junk I added -- things would become a lot
> simpler.  The cost would be bloating pg_class a bit more.  Are we okay
> with paying that cost?  I asked this question some months ago and I
> decided that I would not add the column, but I am starting to lean the
> other way.  I would like some opinions on this.

That's not the only way; autovacuum could just accelerate normal freezing to
advance the multixactid horizon indirectly.  Your proposal could make it far
more efficient, though.

> You asked two questions about WAL-logging locks: one was about the level
> of detail we log for each lock we grab; the other was about
> heap_xlog_update logging the right info.  AFAICS, the main thing that
> makes detailed WAL logging necessary is hot standbys.  That is, the
> standby must have all the locking info so that concurrent transactions
> are similarly locked as in the master ... or am I wrong in that?  (ISTM
> this means I need to fix heap_xlog_update so that it faithfully
> registers the lock info we're storing, not just the current Xid).

Standby servers do not need accurate tuple locks, because it's not possible to
wait on a tuple lock during recovery.  (By the way, the question about log
detail was just from my own curiosity.  We don't especially need an answer to
move forward with this patch, unless you want one.)

> * Columns that are part of the key
>   Noah thinks the set of columns should only consider those actually 
> referenced
>   by keys, not those that *could* be referenced.

Well, do you disagree?  To me it's low-hanging fruit, because it isolates the
UPDATE-time overhead of this patch to FK-referenced tables rather than all
tables having a PK or PK-like index (often just "all tables").

>   Also, in a table without columns, are all columns part of the key, or is the
>   key the empty set?  I changed HeapSatisfiesHOTUpdate but that seems 
> arbitrary.

It does seem arbitrary.  What led you to switch in a later version?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-30 Thread Simon Riggs

On Mon, Jan 30, 2012 at 8:04 PM, Heikki Linnakangas
 wrote:

> So, what's the approach you're working on?

I've had a few days leave at end of last week, so no time to fully
discuss the next steps with the patch. That's why you were requested
not to commit anything.

You've suggested there was no reason to have the WALwriter be
involved, which isn't the case and made other comments about latches
that weren't correct also.

The plan here is to allow WAL flush and clog updates to occur
concurrently. Which allows the clog contention and update time to be
completely hidden behind the wait for the WAL flush. That is only
possible if we have the WALwriter involved since we need two processes
to be actively involved.

It's a relatively minor change and uses code that is already committed
and working, not some just invented low level stuff that might not
work right. You might then ask, why the delay? Just simply because my
absence has prevented moving forwards. We'll have a patch tomorrow.

The theory behind this is clear, but needs some explanation.

There are 5 actions that need to occur at commit
1) insert WAL record
2) optionally flush WAL record
3) mark the clog AND set LSN from (1) if we skipped (2)
4) optionally wait for sync rep
5) remove the proc from the procarray

Dependencies between those actions are these
Step (3) must always happen before (5) otherwise we get race
conditions in visibility checking.
Step (4) must always happen before (5) otherwise we also get race
conditions in disaster cases.
Step (1) must always happen before (2) if it happens
Step (1) must always happen before (3) if we skipped (2)

Notice that step (2) and step (3) are actually independent of each other.

So an improved design for commit is to
2) request flush up to LSN, but don't wait
3) mark the clog and set LSN
4) wait for LSN once, either for walwriter or walsender to release us

This is free of race conditions as long as step (3) marks each clog
page with a valid LSN, just as we would do for asynchronous commit.

Marking the clog with an LSN ensures that we issue XLogFlush(LSN) on
the clog page before it is written, so we always get WAL flushed to
the desired LSN before the clog mark appears on disk.

Does this cause any other behaviour? No, because the LSN marked on the
clog is always flushed by the time we hit step (5), so there is no
delay in any hint bit setting, or any other effect.

So step (2) requests the flush, which is then performed by WALwriter.
Backend then performs (3) while the flush takes place and then waits
at step (4) to be woken

We only wait once in step 4, rather than waiting for flush at step (2)
and then waiting again at step (4).

So we use the existing code path for TransactionIdAsyncCommitTree()
yet we wait at step (4) using the SyncRep code.

Step 5 happens last, as always.

There are two benefits to this approach
* The clog update happens "for free" since it is hidden behind a
longer running task
* The implementation uses already tested and robust code for SyncRep
and AsyncCommit

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Make group commit more effective.

2012-01-30 Thread Heikki Linnakangas


On 30.01.2012 23:06, Heikki Linnakangas wrote:

On 30.01.2012 22:50, Heikki Linnakangas wrote:

On 30.01.2012 20:27, Robert Haas wrote:

Either this patch, or something else committed this morning, is
causing "make check" to hang or run extremely slowly for me. I think
it's this patch, because I attached to a backend and stopped it a few
times, and all the backtraces look like this:


Yeah, sure looks like it's the group commit commit. It works for me, and
staring at the code, I have no idea what could be causing it. The
buildfarm seems happy too, so this is pretty mysterious.


And just after sending that, I succeeded to reproduce this. I had to
lower wal_buffers to a small value to make it happen. I'm debugging this
now..


It was a bug in the LWLockRelease code, after all. Fixed. Unfortunately 
this added a couple more instructions to that critical codepath, but I 
think it should still go without notice. Let me know if this doesn't fix 
the hang on your laptop.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-01-30 Thread Euler Taveira de Oliveira

On 30-01-2012 15:33, Gilles Darold wrote:
> Yesterday I was facing a little issue with a backup software (NetBackup)
> that do not report error when a post backup script is run. The problem
> is that this script execute pg_stop_backup() and if there's any failure
> PostgreSQL keeps running in on-line backup mode. So the backup is not
> completed and the next one too because the call to pg_start_backup()
> will fail.
> 
I use something similar to your pl/PgSQL version and was about to propose
something along these lines to core. +1 to include it. Please, add your patch
to the next CF.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Make group commit more effective.

2012-01-30 Thread Heikki Linnakangas


On 30.01.2012 22:50, Heikki Linnakangas wrote:

On 30.01.2012 20:27, Robert Haas wrote:

Either this patch, or something else committed this morning, is
causing "make check" to hang or run extremely slowly for me. I think
it's this patch, because I attached to a backend and stopped it a few
times, and all the backtraces look like this:


Yeah, sure looks like it's the group commit commit. It works for me, and
staring at the code, I have no idea what could be causing it. The
buildfarm seems happy too, so this is pretty mysterious.


And just after sending that, I succeeded to reproduce this. I had to 
lower wal_buffers to a small value to make it happen. I'm debugging this 
now..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Make group commit more effective.

2012-01-30 Thread Heikki Linnakangas


On 30.01.2012 20:27, Robert Haas wrote:

Either this patch, or something else committed this morning, is
causing "make check" to hang or run extremely slowly for me.  I think
it's this patch, because I attached to a backend and stopped it a few
times, and all the backtraces look like this:


Yeah, sure looks like it's the group commit commit. It works for me, and 
staring at the code, I have no idea what could be causing it. The 
buildfarm seems happy too, so this is pretty mysterious.


I did find one bug, see attached, but AFAICS it should only cause 
unnecessary wakeups in some corner cases, which is harmless.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
commit 73b479198d9e7e1cbdb2da705a0cd9226b0d8265
Author: Heikki Linnakangas 
Date:   Mon Jan 30 20:56:35 2012 +0200

Fix thinko in the group commit patch.

When releasing a lwlock, if the first waiter in the queue waited for a
shared lock, and all the rest were of the new "wait-until-free" kind, the
releaseOK flag would not be cleared. That's harmless AFAICS, but it was
not intended.

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index bee35b8..eb9ff95 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -768,9 +768,9 @@ LWLockRelease(LWLockId lockid)
 while (proc->lwWaitLink != NULL &&
 	   proc->lwWaitLink->lwWaitMode != LW_EXCLUSIVE)
 {
-	proc = proc->lwWaitLink;
 	if (proc->lwWaitMode != LW_WAIT_UNTIL_FREE)
 		releaseOK = false;
+	proc = proc->lwWaitLink;
 }
 			}
 			/* proc is now the last PGPROC to be released */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLOG contention, part 2

2012-01-30 Thread Robert Haas

On Fri, Jan 27, 2012 at 8:21 PM, Jeff Janes  wrote:
> On Fri, Jan 27, 2012 at 3:16 PM, Merlin Moncure  wrote:
>> On Fri, Jan 27, 2012 at 4:05 PM, Jeff Janes  wrote:
>>> Also, I think the general approach is wrong.  The only reason to have
>>> these pages in shared memory is that we can control access to them to
>>> prevent write/write and read/write corruption.  Since these pages are
>>> never written, they don't need to be in shared memory.   Just read
>>> each page into backend-local memory as it is needed, either
>>> palloc/pfree each time or using a single reserved block for the
>>> lifetime of the session.  Let the kernel worry about caching them so
>>> that the above mentioned reads are cheap.
>>
>> right -- exactly.  but why stop at one page?
>
> If you have more than one, you need code to decide which one to evict
> (just free) every time you need a new one.  And every process needs to
> be running this code, while the kernel is still going to need make its
> own decisions for the entire system.  It seems simpler to just let the
> kernel do the job for everyone.  Are you worried that a read syscall
> is going to be slow even when the data is presumably cached in the OS?

I think that would be a very legitimate worry.  You're talking about
copying 8kB of data because you need two bits.  Even if the
user/kernel mode context switch is lightning-fast, that's a lot of
extra data copying.

In a previous commit, 33aaa139e6302e81b4fbf2570be20188bb974c4f, we
increased the number of CLOG buffers from 8 to 32 (except in very
low-memory configurations).  The main reason that shows a win on Nate
Boley's 32-core test machine appears to be because it avoids the
scenario where there are, say, 12 people simultaneously wanting to
read 12 different CLOG buffers, and so 4 of them have to wait for a
buffer to become available before they can even think about starting a
read.  The really bad latency spikes were happening not because the
I/O took a long time, but because it can't be started immediately.
However, these spikes are now gone, as a result of the above-commit.
Probably you can get them back with enough cores, but you'll probably
hit a lot of other, more serious problems first.

I assume that if there's any purpose to further optimization here,
it's either because the overall miss rate of the cache is too large,
or because the remaining locking costs are too high.  Unfortunately I
haven't yet had time to look at this patch and understand what it
does, or machine cycles available to benchmark it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-30 Thread Heikki Linnakangas


On 30.01.2012 21:55, Simon Riggs wrote:

On Mon, Jan 30, 2012 at 3:32 PM, Heikki Linnakangas
  wrote:

On 30.01.2012 17:18, Simon Riggs wrote:



Peter and I have been working on a new version that seems likely to
improve performance over your suggestions. We should be showing
something soon.


Please post those ideas, and let's discuss them. If it's something simple,
maybe we can still sneak them into this release. Otherwise, let's focus on
the existing patches that are pending review or commit.


If you really did want to discuss it, it would have taken you 5
minutes to check whether there was consensus on the patch before
committing it. Your actions betray the opposite of a willingness to
discuss anything.

Yes, I'd like to discuss ideas, not just ram home a half-discussed and
half-finished patch that happens to do things the way you personally
prefer, overriding all inputs.

Especially when you know we're working on another version.


Sorry. So, what's the approach you're working on?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-30 Thread Simon Riggs

On Mon, Jan 30, 2012 at 3:32 PM, Heikki Linnakangas
 wrote:
> On 30.01.2012 17:18, Simon Riggs wrote:

>> I asked clearly and specifically for you to hold back committing
>> anything. Not sure why you would ignore that and commit without
>> actually asking myself or Peter. On a point of principle alone, I
>> think you should revert. Working together is difficult if
>> communication channels are openly ignored and disregarded.
>
>
> You must be referring to this:
>
> http://archives.postgresql.org/pgsql-hackers/2012-01/msg01406.php
>
> What I committed in the end was quite different from the version that was in
> reply to, too. If you have a specific objection to the patch as committed,
> please let me know.

I said "There is much yet to discuss so please don't think about committing
anything yet."

There's not really any way you could misinterpret them.


>> Peter and I have been working on a new version that seems likely to
>> improve performance over your suggestions. We should be showing
>> something soon.
>
>
> Please post those ideas, and let's discuss them. If it's something simple,
> maybe we can still sneak them into this release. Otherwise, let's focus on
> the existing patches that are pending review or commit.

If you really did want to discuss it, it would have taken you 5
minutes to check whether there was consensus on the patch before
committing it. Your actions betray the opposite of a willingness to
discuss anything.

Yes, I'd like to discuss ideas, not just ram home a half-discussed and
half-finished patch that happens to do things the way you personally
prefer, overriding all inputs.

Especially when you know we're working on another version.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?

2012-01-30 Thread hubert depesz lubaczewski

On Mon, Jan 30, 2012 at 10:35:21AM -0800, Josh Berkus wrote:
> 
> > preferably I would see extract( epoch from timestamp ) to be really
> > immutable, i.e. (in my opinion) it should treat incoming data as UTC
> > - for epoch calculation.
> > Alternatively - perhaps epoch extraction should be moved to specialized
> > function, which would have swapped mutability:
> 
> We can't have functions which are immutable or not depending on their
> inputs.  That way lies madness.

but this is exactly what's happening now.
extract( ... from timestamp) is marked as immutable, while in some cases
(namely when you want epoch) it should be stable because the return from
function changes.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?

2012-01-30 Thread Josh Berkus


> preferably I would see extract( epoch from timestamp ) to be really
> immutable, i.e. (in my opinion) it should treat incoming data as UTC
> - for epoch calculation.
> Alternatively - perhaps epoch extraction should be moved to specialized
> function, which would have swapped mutability:

We can't have functions which are immutable or not depending on their
inputs.  That way lies madness.

> get_epoch(timestamptz) would be immutable
> while
> get_epoch(timestamp) would be stable

Well, to_epoch, in order to be consistent with other conversion functions.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch pg_is_in_backup()

2012-01-30 Thread Gilles Darold

Hello,

Yesterday I was facing a little issue with a backup software (NetBackup)
that do not report error when a post backup script is run. The problem
is that this script execute pg_stop_backup() and if there's any failure
PostgreSQL keeps running in on-line backup mode. So the backup is not
completed and the next one too because the call to pg_start_backup()
will fail.

After some time searching for a Pg system administration function like
pg_is_in_recovery(), let's say pg_is_in_backup(), I couldn't find one.
The minor patch attached here adds this administrative function that can
be used with others backup control functions. This is a very little
patch outside documentation because the function is only a wrapper to
the internal BackupInProgress() function, just like pg_is_in_recovery()
is calling RecoveryInProgress().

I hope it could be included as it could help to save time for some
sysadmin who want to monitor on-line backup process and that do not have
SQL knowledge. Saying that it's always possible to check if the
backup_label file exist under PGDATA but this need to be checked on the
host with postgres priviledge. The other way is to create a plpgsql
function with security definer that will have the same effect than the
patch above except that it need some SQL knowledge. I've give it here
for web search, thanks to Guillaume and Marc.

CREATE OR REPLACE FUNCTION pg_is_in_backup ( )
RETURNS BOOLEAN AS $$
DECLARE is_exists BOOLEAN;
BEGIN
-- Set a secure search_path: trusted schemas, then 'pg_temp'.
PERFORM pg_catalog.set_config('search_path', 'pg_temp', true);
SELECT ((pg_stat_file('backup_label')).modification IS NOT NULL)
INTO is_exists;
RETURN is_exists;
EXCEPTION
WHEN undefined_file THEN
  RETURN false;
END
$$ LANGUAGE plpgsql SECURITY DEFINER;

Regards,

-- 
Gilles Darold
http://dalibo.com - http://dalibo.org

diff -ru postgresql/doc/src/sgml/func.sgml postgresql-is_in_backup-patch/doc/src/sgml/func.sgml
--- postgresql/doc/src/sgml/func.sgml	2012-01-26 23:01:31.956613398 +0100
+++ postgresql-is_in_backup-patch/doc/src/sgml/func.sgml	2012-01-26 23:14:29.468613019 +0100
@@ -14371,6 +14371,9 @@
 pg_current_xlog_location


+pg_is_in_backup
+   
+   
 pg_start_backup


@@ -14424,6 +14427,14 @@
   
   

+pg_is_in_backup()
+
+   bool
+   True if an on-line backup is still in progress.
+   
+  
+  
+   
 pg_start_backup(label text , fast boolean )
 
text
diff -ru postgresql/src/backend/access/transam/xlogfuncs.c postgresql-is_in_backup-patch/src/backend/access/transam/xlogfuncs.c
--- postgresql/src/backend/access/transam/xlogfuncs.c	2012-01-26 23:01:32.032613398 +0100
+++ postgresql-is_in_backup-patch/src/backend/access/transam/xlogfuncs.c	2012-01-26 23:16:04.100612972 +0100
@@ -465,3 +465,12 @@
 {
 	PG_RETURN_BOOL(RecoveryInProgress());
 }
+
+/*
+ * Returns bool with current on-line backup mode, a global state.
+ */
+Datum
+pg_is_in_backup(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_BOOL(BackupInProgress());
+}
diff -ru postgresql/src/include/access/xlog_internal.h postgresql-is_in_backup-patch/src/include/access/xlog_internal.h
--- postgresql/src/include/access/xlog_internal.h	2012-01-26 23:01:32.332613398 +0100
+++ postgresql-is_in_backup-patch/src/include/access/xlog_internal.h	2012-01-26 23:22:51.492612774 +0100
@@ -281,5 +281,6 @@
 extern Datum pg_xlog_replay_pause(PG_FUNCTION_ARGS);
 extern Datum pg_xlog_replay_resume(PG_FUNCTION_ARGS);
 extern Datum pg_is_xlog_replay_paused(PG_FUNCTION_ARGS);
+extern Datum pg_is_in_backup(PG_FUNCTION_ARGS);
 
 #endif   /* XLOG_INTERNAL_H */
diff -ru postgresql/src/include/catalog/pg_proc.h postgresql-is_in_backup-patch/src/include/catalog/pg_proc.h
--- postgresql/src/include/catalog/pg_proc.h	2012-01-26 23:01:32.340613398 +0100
+++ postgresql-is_in_backup-patch/src/include/catalog/pg_proc.h	2012-01-27 09:26:09.918736657 +0100
@@ -2899,6 +2899,8 @@
 DESCR("prepare for taking an online backup");
 DATA(insert OID = 2173 ( pg_stop_backup			PGNSP PGUID 12 1 0 0 0 f f f t f v 0 0 25 "" _null_ _null_ _null_ _null_ pg_stop_backup _null_ _null_ _null_ ));
 DESCR("finish taking an online backup");
+DATA(insert OID = 3882 (  pg_is_in_backup		PGNSP PGUID 12 1 0 0 0 f f f t f v 0 0 16 "" _null_ _null_ _null_ _null_ pg_is_in_backup _null_ _null_ _null_ ));
+DESCR("true if server is in on-line backup");
 DATA(insert OID = 2848 ( pg_switch_xlog			PGNSP PGUID 12 1 0 0 0 f f f t f v 0 0 25 "" _null_ _null_ _null_ _null_ pg_switch_xlog _null_ _null_ _null_ ));
 DESCR("switch to new xlog file");
 DATA(insert OID = 3098 ( pg_create_restore_point	PGNSP PGUID 12 1 0 0 0 f f f t f v 1 0 25 "25" _null_ _null_ _null_ _null_ pg_create_restore_point _null_ _null_ _null_ ));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Make group commit more effective.

2012-01-30 Thread Robert Haas

On Mon, Jan 30, 2012 at 9:55 AM, Heikki Linnakangas
 wrote:
> Make group commit more effective.
>
> When a backend needs to flush the WAL, and someone else is already flushing
> the WAL, wait until it releases the WALInsertLock and check if we still need
> to do the flush or if the other backend already did the work for us, before
> acquiring WALInsertLock. This helps group commit, because when the WAL flush
> finishes, all the backends that were waiting for it can be woken up in one
> go, and the can all concurrently observe that they're done, rather than
> waking them up one by one in a cascading fashion.
>
> This is based on a new LWLock function, LWLockWaitUntilFree(), which has
> peculiar semantics. If the lock is immediately free, it grabs the lock and
> returns true. If it's not free, it waits until it is released, but then
> returns false without grabbing the lock. This is used in XLogFlush(), so
> that when the lock is acquired, the backend flushes the WAL, but if it's
> not, the backend first checks the current flush location before retrying.
>
> Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
> this patch as committed ended up being very different from that.

Either this patch, or something else committed this morning, is
causing "make check" to hang or run extremely slowly for me.  I think
it's this patch, because I attached to a backend and stopped it a few
times, and all the backtraces look like this:

#0  0x7fff8a545b22 in semop ()
#1  0x0001001ff8df in PGSemaphoreLock (sema=0x103d7de70,
interruptOK=0 '\0') at pg_sema.c:418
#2  0x00010024d7dd in LWLockWaitUntilFree (lockid=, mode=) at lwlock.c:666
#3  0x00010005d3b3 in XLogFlush (record=) at xlog.c:2148
#4  0x0001000506bb in CommitTransaction () at xact.c:1113
#5  0x000100050b35 in CommitTransactionCommand () at xact.c:2613
#6  0x00010025a403 in finish_xact_command () at postgres.c:2388
#7  0x00010025d525 in exec_simple_query (query_string=0x101055638
"CREATE INDEX wowidx ON test_tsvector USING gin (a);") at
postgres.c:1052
#8  0x00010025dfc1 in PostgresMain (argc=2, argv=, username=) at postgres.c:3881
#9  0x00010020c258 in ServerLoop () at postmaster.c:3587
#10 0x00010020d167 in PostmasterMain (argc=6, argv=0x100d08f40) at
postmaster.c:1110
#11 0x00010019e745 in main (argc=6, argv=0x100d08f40) at main.c:199

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-01-30 Thread Robert Haas

On Sun, Jan 29, 2012 at 7:27 AM, Kohei KaiGai  wrote:
> 2012/1/28 Kohei KaiGai :
>> 2012/1/26 Robert Haas :
>>> On Thu, Jan 26, 2012 at 2:07 PM, Kohei KaiGai  wrote:
 2012/1/26 Robert Haas :
> I'm wondering if a function would be a better fit than a GUC.  I don't
> think you can really restrict the ability to revert a GUC change -
> i.e. if someone does a SET and then a RESET, you pretty much have to
> allow that.  I think.  But if you expose a function then it can work
> however you like.
>
 One benefit to use GUC is that we can utilize existing mechanism to
 revert a value set within a transaction block on error.
 If we implement same functionality with functions, XactCallback allows
 sepgsql to get control on appropriate timing?
>>>
>>> Not sure, but I thought the use case was to set this at connection
>>> startup time and then hand the connection off to a client.  What keeps
>>> the client from just issuing RESET?
>>>
>> In the use case of Joshua, the original security label does not privileges
>> to reference any tables, and it cannot translate any domains without
>> credential within IC-cards. Thus, RESET is harmless.
>>
>> However, I also assume one other possible use-case; the originator
>> has privileges to translate 10 of different domains, but disallows to
>> revert it. In this case, RESET without permission checks are harmful.
>> (This scenario will be suitable with multi-category-model.)
>>
>> Apart from this issue, I found a problem on implementation using GUC
>> options. It makes backend crash, if it tries to reset sepgsql.client_label
>> without permissions within error handler because of double-errors.
>>
> I found an idea to avoid this scenario.
>
> The problematic case is reset GUC variable because of transaction
> rollback and permission violation, however, we don't intend to apply
> permission checks, since it is not committed yet.
> Thus, I considered to check state of the transaction using
> IsTransactionState() on the assign_hook of GUC variable.
>
> Unlike function based implementation, it is available to utilize existing
> infrastructure to set the client_label to be transaction-aware.
>
> It shall be implemented as:
>
> void
> sepgsql_assign_client_label(const char *newval, void *extra)
> {
>    if (!IsTransactionState)
>        return;
>
>    /* check whether valid security context */
>
>    /* check permission check to switch current context */
> }
>
> How about such an implementation?

Well, I think the idea that GUC changes can be reverted at any time is
fairly deeply ingrained in the GUC machinery.  So I think that's a
hack: it might happen to work today (or at least on the cases we can
think of to test), but I wouldn't be a bit surprised if there are
other cases where it fails, either now or in the future.  I don't see
any good reason to assume that unrevertable changes are OK even inside
a transaction context.  For example, if someone applies a context to a
function with ALTER FUNCTION, they're going to expect that the altered
context remains in effect while the function is running and gets
reverted on exit.  If you throw an error when the system tries to
revert the change at function-exit time, it might not crash the
server, but it's certainly going to violate the principle of least
astonishment.

Of course, we already have a trusted procedure mechanism which is
intended to support temporary changes to the effective security label,
so you might say, hey, people shouldn't do that.  But they might.  And
I wouldn't like to bet that's the only case that could be problematic
either.  What about a setting in postgresql.conf?  You could end up
being asked to set the security label to some other value before
you've initialized it.  What about SET LOCAL?  It's not OK to fail to
revert that at transaction exit.

Hence my suggestion of a function: if you use functions, you can
implement whatever semantics you want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-30 Thread Marko Kreen

On Mon, Jan 30, 2012 at 06:06:57PM +0900, Kyotaro HORIGUCHI wrote:
> The gain of performance is more than expected. Measure script now
> does query via dblink ten times for stability of measuring, so
> the figures become about ten times longer than the previous ones.
> 
>sec% to Original
> Original : 31.5 100.0%
> RowProcessor patch   : 31.3  99.4%
> dblink patch : 24.6  78.1%
> 
> RowProcessor patch alone makes no loss or very-little gain, and
> full patch gives us 22% gain for the benchmark(*1).

Excellent!

> - The callers of RowProcessor() no more set null_field to
>   PGrowValue.value. Plus, the PGrowValue[] which RowProcessor()
>   receives has nfields + 1 elements to be able to make rough
>   estimate by cols->value[nfields].value - cols->value[0].value -
>   something.  The somthing here is 4 * nfields for protocol3 and
>   4 * (non-null fields) for protocol2. I fear that this applies
>   only for textual transfer usage...

Excact estimate is not important here.  And (nfields + 1) elem
feels bit too much magic, considering that most users probably
do not need it.  Without it, the logic would be:

 total = last.value - first.value + ((last.len > 0) ? last.len : 0)

which isn't too complex.  So I think we can remove it.

= Problems =

* Remove the dubious memcpy() in pqAddRow()

* I think the dynamic arrays in getAnotherTuple() are not portable enough,
  please do proper allocation for array.  I guess in PQsetResultAttrs()?

= Minor notes =

These can be argued either way, if you don't like some
suggestion, you can drop it.

* Move PQregisterRowProcessor() into fe-exec.c, then we can make
  pqAddRow static.

* Should PQclear() free RowProcessor error msg?  It seems
  it should not get outside from getAnotherTuple(), but
  thats not certain.  Perhaps it would be clearer to free
  it here too.

* Remove the part of comment in getAnotherTuple():
   * Buffer content may be shifted on reloading additional
   * data. So we must set all pointers on every scan.

  It's confusing why it needs to clarify that, as there
  is nobody expecting it.

* PGrowValue documentation should mention that ->value pointer
  is always valid.

* dblink: Perhaps some of those mallocs() could be replaced
  with pallocs() or even StringInfo, which already does
  the realloc dance?  I'm not familiar with dblink, and
  various struct lifetimes there so I don't know it that
  actually makes sense or not.

It seems this patch is getting ReadyForCommitter soon...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby off of hot standby?

2012-01-30 Thread Josh Berkus

On 1/29/12 8:36 PM, Igor Schtein wrote:
> Hi,
> 
> Is it possible to use a standby instance as a master/primary for another
> standby in Postgres 9.0?  In other words, does PG 9.0 supports cascading
> standby configuration?

No, that's a 9.2 feature in development.

If you can build PostgreSQL from source and apply patches, we could use
your help testing it!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?

2012-01-30 Thread hubert depesz lubaczewski

On Wed, Jan 25, 2012 at 11:30:49AM -0500, Tom Lane wrote:
> hubert depesz lubaczewski  writes:
> > anyway - the point is that in \df date_part(, timestamp) says it's
> > immutable, while it is not.
> 
> Hmm, you're right.  I thought we'd fixed that way back when, but
> obviously not.  Or maybe the current behavior of the epoch case
> postdates that.

is there a chance something will happen with/about it?

preferably I would see extract( epoch from timestamp ) to be really
immutable, i.e. (in my opinion) it should treat incoming data as UTC
- for epoch calculation.
Alternatively - perhaps epoch extraction should be moved to specialized
function, which would have swapped mutability:

get_epoch(timestamptz) would be immutable
while
get_epoch(timestamp) would be stable

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-30 Thread Robert Haas

On Mon, Jan 30, 2012 at 10:48 AM, Jeff Janes  wrote:
> On Mon, Jan 30, 2012 at 7:24 AM, Robert Haas  wrote:
>> On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes  wrote:
 I think that even in normal (non-initialization) usage, this message
 should be suppressed when the provided scale factor
 is equal to the pgbench_branches table count.
>>>
>>> The attached patch does just that.  There is probably no reason to
>>> warn people that we are doing what they told us to, but not for the
>>> reason they think.
>>
>> In my opinion, a more sensible approach than anything we're doing
>> right now would be to outright *reject* options that will only be
>> ignored.  If -s isn't supported except with -i, then trying to specify
>> -s without -i should just error out at the options-parsing stage,
>> before we even try to connect to the database.  It's not very helpful
>> to accept options and then ignore them, and we have many instances of
>> that right now: initialization-only switches are accepted and ignored
>> when not initializing, and run-only switches are accepted and ignored
>> with initializing.
>
> I like the ability to say, effectively, "I think I had previously did
> the initialization with -s 40, if I actually didn't then scream at me,
> and if I did then go ahead and do the pgbench I just asked for".
> But, since it does unconditionally report the scale actually used and
> I just have to have the discipline to go look at the result, I can see
> where this is perhaps overkill.   In my own (non-PG-related) code,
> when I have tasks that have to be run in multiple phases that can get
> out of sync if I am not careful, I like to be able to specify the
> flags even in the "unused" invocation, so that the code can verify I
> am being consistent.  Code is better at that than I am.

I guess my point is - if we're gonna have it do that, then shouldn't
we actually *error out* if the scales don't match, not just print a
warning and then charge ahead?

> I'm not sure I know what all would be incompatible with what.  I could
> start drawing that matrix up once the API stabilizes, but I think you
> are still planning on whacking this -I option around a bit.

It's mostly that everything in the "initialization options" section of
pgbench --help is incompatible with everything in the "benchmarking
options" section, and the other way around.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-30 Thread Robert Haas

On Mon, Jan 30, 2012 at 10:53 AM, Jeff Janes  wrote:
> On Thu, Jan 26, 2012 at 6:18 AM, Abhijit Menon-Sen  wrote:
>> At 2012-01-12 12:31:20 +, si...@2ndquadrant.com wrote:
>>>
>>> The following patch adds a pgbench option -I to load data using
>>> INSERTs
>>
>> This is just to confirm that the patch applies and builds and works
>> fine (though of course it does take a long time… pity there doesn't
>> seem to be any easy way to add progress indication like -i has).
>
> I was thinking the opposite.  That -i should only print progress
> indication when -d is given.  Or at least knock an order of magnitude
> or two off of how often it does so.

I'd be in all in favor of having -i emit progress reports 10x less
often; even on a laptop, the current reports are very chatty.  But I
think 100x less often might be taking it too far.

Either way, if we're going to have an option for inserts, they should
produce the same progress reports that COPY does - though possibly
more often, since I'm guessing it's likely to be way slower.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for parallel pg_dump

2012-01-30 Thread Robert Haas

On Sun, Jan 29, 2012 at 12:00 PM, Tom Lane  wrote:
> Joachim Wieland  writes:
>> I know that you took back some of your comments, but I'm with you
>> here. Archive is allocated as an ArchiveHandle and then casted back to
>> Archive*, so you always know that an Archive is an ArchiveHandle. I'm
>> all for getting rid of Archive and just using ArchiveHandle throughout
>> pg_dump which would get rid of these useless casts.
>
> I'd like to see a more thoroughgoing look at the basic structure of
> pg_dump.  Everybody who's ever looked at that code has found it
> confusing, with the possible exception of the original author who is
> long gone from the project anyway.  I don't know exactly what would make
> it better, but the useless distinction between Archive and ArchiveHandle
> seems like a minor annoyance, not the core disease.
>
> Not that there'd be anything wrong with starting with that.

After some study, I'm reluctant to completely abandon the
Archive/ArchiveHandle distinction because it seems to me that it does
serve some purpose: right now, nothing in pg_dump.c - where all the
code to actually dump stuff lives - knows anything about what's inside
the ArchiveHandle, just the Archive.  So having two data structures
really does serve a purpose: if it's part of the Archive, you need it
in order to query the system catalogs and generate SQL.  If it isn't,
you don't.  Considering how much more crap there is inside an
ArchiveHandle than an Archive, I don't think we should lightly abandon
that distinction.

Now, that having been said, there are probably better ways of making
that distinction than what we have here; Archive and ArchiveHandle
could be better named, perhaps, and we could have pointers from one
structure to the other instead of magically embedding them inside each
other.  All the function pointers that are part of the ArchiveHandle
could be separated out into a static, constant structure with a name
like ArchiveMethod, and we could keep a pointer to the appropriate
ArchiveMethod inside each ArchiveHandle instead of copying all the
pointers into it.  I think that'd be a lot less confusing.

But the immediate problem is that pg_dump.c is heavily reliant on
global variables, which isn't going to fly if we want this code to use
threads on Windows (or anywhere else).  It's also bad style.  So I
suggest that we refactor pg_dump.c to get rid of g_conn and g_fout.
Getting rid of g_fout should be fairly easy: in many places, we're
already passing Archive *fout as a parameter.  If we pass it as a
parameter to the functions that need it but aren't yet getting it as a
parameter, then it can cease to exist as a global variable and become
local to main().

Getting rid of g_conn is a little harder.  Joachim's patch takes the
view that we can safely overload the existing ArchiveHandle.connection
member.  Currently, that member is the connection to which we are
doing a direct to database restore; what he's proposing to do is also
use it to store the connection from which are doing the dump.  But it
seems possible that we might someday want to dump from one database
and restore into another database at the same time, so maybe we ought
to play it safe and use different variables for those things.  So I'm
inclined to think we could just add a "PGconn *remote_connection"
argument to the Archive structure (to go with all of the similarly
named "remote" fields, all of which describe the DB to be dumped) and
then that would be available everywhere that the Archive itself is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Confusing EXPLAIN output in case of inherited tables

2012-01-30 Thread Tom Lane

Ashutosh Bapat  writes:
> So, as I understand we have two problems here
> 1. Prefixing schemaname to the fake alises if there is another RTE with
> same name. There may not be a relation with that name (fake alias name
> given) in the schema chosen as prefix.
> 2. Fake aliases themselves can be conflicting.

Well, the issue is more that a fake alias might unintentionally collide
with a regular alias elsewhere in the query.  There's no guard against
that in the current behavior, and ISTM there needs to be.

The one possibly-simplifying thing about this whole issue is that we
needn't cater for references that couldn't have been made in the
original query.  For instance, if the inner and outer queries both have
explicit aliases "tx", it's impossible for the inner query to have
referred to any columns of the outer "tx" --- so we don't have to try to
make it possible in the dumped form.

> If I understand correctly, if we solve the second problem, first problem
> will not occur. Is that correct?

I don't believe that the problem has anything to do with the names of
other tables that might happen to exist in the database.  It's a matter
of what RTE names/aliases are exposed for variable references in
different parts of the query.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-30 Thread Jeff Janes

On Sun, Jan 29, 2012 at 1:20 PM, Greg Smith  wrote:
> On 01/28/2012 07:48 PM, Jeff Janes wrote:
>>
>
>> I haven't inspected that deep fall off at 30 clients for the patch.
>> By way of reference, if I turn off synchronous commit, I get
>> tps=1245.8 which is 100% CPU limited.  This sets an theoretical upper
>> bound on what could be achieved by the best possible group committing
>> method.
>
>
> This sort of thing is why I suspect that to completely isolate some results,
> we're going to need a moderately high end server--with lots of
> cores--combined with an intentionally mismatched slow drive.  It's too easy
> to get pgbench and/or PostgreSQL to choke on something other than I/O when
> using smaller core counts.  I don't think I have anything where the floor is
> 24 TPS per client though.  Hmmm...I think I can connect an IDE drive to my
> MythTV box and format it with ext4.  Thanks for the test idea.
>
> One thing you could try on this system is using the -N "Do not update
> pgbench_tellers and pgbench_branches".  That eliminates a lot of the
> contention that might be pulling down your higher core count tests, while
> still giving a completely valid test of whether the group commit mechanism
> works.  Not sure whether that will push up the top-end usefully for you,
> worth a try if you have time to test again.

Adding the -N did eliminate the fall-off at 30 clients for
group_commit patch.  But, I still want to explore why the fall off
occurs when I get a chance.  I know why the curve would stop going up
without using -N (with -s of 40 and -c of 30, many connections will be
waiting on row locks for updates to pgbench_branches) but that should
cause a leveling off, not a collapse.

Other than the lack of drop off at 30 clients, -N didn't meaningfully
change anything.  Everyone got slightly faster except at -c1.

>> If the group_commit patch goes in, would we then rip out commit_delay
>> and commit_siblings?
>
>
> The main reason those are still hanging around at all are to allow pushing
> on the latency vs. throughput trade-off on really busy systems.  The use
> case is that you expect, say, 10 clients to constantly be committing at a
> high rate.  So if there's just one committing so far, assume it's the
> leading edge of a wave and pause a bit for the rest to come in.  I don't
> think the cases where this is useful behavior--people both want it and the
> current mechanism provides it--are very common in the real world.

The tests I did are exactly that environment where commit_delay might
be expected to help.  And it did help, but just not all that much. One
of the problems is that while it does wait for those others to come in
and then it does flush them in one fsync; but often the others never
get woken up successfully to realize that they have already been
flushed.  They continue to block.

The group_commit patch, on the other hand, accomplishes exactly what
commit_delay was intended to accomplish but doesn't do a very good job
of.

With the -N option, I also used commit_delay on top of group_commit,
and the difference between the two look like it was within the margin
of error.  So commit_delay did not obviously cause further
improvement.

> It can be
> useful for throughput oriented benchmarks though, which is why I'd say it
> hasn't killed off yet.
>
> We'll have to see whether the final form this makes sense in will usefully
> replace that sort of thing.  I'd certainly be in favor of nuking
> commit_delay and commit_siblings with a better option; it would be nice if
> we don't eliminate this tuning option in the process though.

But I'm pretty sure that group_commit has stolen that thunder.
Obviously a few benchmarks on one system isn't enough to prove that,
though.

The only use case I see left for commit_delay is where it is set on a
per-connection basis rather than system-wide.  Once you start a fsync,
everyone who missed the bus is locked out until the next one.  So
low-priority connections can set commit_delay so as not to trigger the
bus to leave before the high priority process gets on.  But that seems
like a pretty tenuous use case with better ways to do it.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-30 Thread Jeff Janes

On Thu, Jan 26, 2012 at 6:18 AM, Abhijit Menon-Sen  wrote:
> At 2012-01-12 12:31:20 +, si...@2ndquadrant.com wrote:
>>
>> The following patch adds a pgbench option -I to load data using
>> INSERTs
>
> This is just to confirm that the patch applies and builds and works
> fine (though of course it does take a long time… pity there doesn't
> seem to be any easy way to add progress indication like -i has).

I was thinking the opposite.  That -i should only print progress
indication when -d is given.  Or at least knock an order of magnitude
or two off of how often it does so.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-30 Thread Jeff Janes

On Mon, Jan 30, 2012 at 7:24 AM, Robert Haas  wrote:
> On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes  wrote:
>>> I think that even in normal (non-initialization) usage, this message
>>> should be suppressed when the provided scale factor
>>> is equal to the pgbench_branches table count.
>>
>> The attached patch does just that.  There is probably no reason to
>> warn people that we are doing what they told us to, but not for the
>> reason they think.
>
> In my opinion, a more sensible approach than anything we're doing
> right now would be to outright *reject* options that will only be
> ignored.  If -s isn't supported except with -i, then trying to specify
> -s without -i should just error out at the options-parsing stage,
> before we even try to connect to the database.  It's not very helpful
> to accept options and then ignore them, and we have many instances of
> that right now: initialization-only switches are accepted and ignored
> when not initializing, and run-only switches are accepted and ignored
> with initializing.

I like the ability to say, effectively, "I think I had previously did
the initialization with -s 40, if I actually didn't then scream at me,
and if I did then go ahead and do the pgbench I just asked for".
But, since it does unconditionally report the scale actually used and
I just have to have the discipline to go look at the result, I can see
where this is perhaps overkill.   In my own (non-PG-related) code,
when I have tasks that have to be run in multiple phases that can get
out of sync if I am not careful, I like to be able to specify the
flags even in the "unused" invocation, so that the code can verify I
am being consistent.  Code is better at that than I am.

I'm not sure I know what all would be incompatible with what.  I could
start drawing that matrix up once the API stabilizes, but I think you
are still planning on whacking this -I option around a bit.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-30 Thread Andrew Dunstan




On 01/30/2012 09:54 AM, Abhijit Menon-Sen wrote:

At 2012-01-27 09:47:05 +0530, a...@toroid.org wrote:

I've started reviewing this patch, but it'll take me a bit longer to go
through json.c properly.

OK, I finished reading json.c. I don't have an answer to the detoasting
question in the XXX comment, but the code looks fine.



Looking at somewhat analogous code in xml.c, it doesn't seem to be done 
there. SO maybe we don't need to worry about it.




Aside: is query_to_json really necessary? It seems rather ugly and
easily avoidable using row_to_json.



I started with this, again by analogy with query_to_xml(). But I agree 
it's a bit ugly. If we're not going to do it, then we definitely need to 
look at caching the output funcs in the function info. A closer 
approximation is actually:


   SELECT array_to_json(array_agg(q))
   FROM ( your query here ) q;


But then I'd want the ability to break that up a bit with line feeds, so 
we'd need to adjust the interface slightly. (Hint: don't try the above 
with "select * from pg_class".)



I'll wait on further comments, but I can probably turn these changes 
around very quickly once we're agreed.



Cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-30 Thread Heikki Linnakangas


On 30.01.2012 17:18, Simon Riggs wrote:

On Mon, Jan 30, 2012 at 2:57 PM, Heikki Linnakangas
  wrote:


I committed this. I ran pgbench test on an 8-core box and didn't see any
slowdown. It would still be good if you get a chance to rerun the bigger
test, but I feel confident that there's no measurable slowdown.


I asked clearly and specifically for you to hold back committing
anything. Not sure why you would ignore that and commit without
actually asking myself or Peter. On a point of principle alone, I
think you should revert. Working together is difficult if
communication channels are openly ignored and disregarded.


You must be referring to this:

http://archives.postgresql.org/pgsql-hackers/2012-01/msg01406.php

What I committed in the end was quite different from the version that 
was in reply to, too. If you have a specific objection to the patch as 
committed, please let me know.



Peter and I have been working on a new version that seems likely to
improve performance over your suggestions. We should be showing
something soon.


Please post those ideas, and let's discuss them. If it's something 
simple, maybe we can still sneak them into this release. Otherwise, 
let's focus on the existing patches that are pending review or commit.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-30 Thread Simon Riggs

On Mon, Jan 30, 2012 at 2:57 PM, Heikki Linnakangas
 wrote:

> I committed this. I ran pgbench test on an 8-core box and didn't see any
> slowdown. It would still be good if you get a chance to rerun the bigger
> test, but I feel confident that there's no measurable slowdown.

I asked clearly and specifically for you to hold back committing
anything. Not sure why you would ignore that and commit without
actually asking myself or Peter. On a point of principle alone, I
think you should revert. Working together is difficult if
communication channels are openly ignored and disregarded.

Peter and I have been working on a new version that seems likely to
improve performance over your suggestions. We should be showing
something soon.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-30 Thread Robert Haas

On Fri, Jan 27, 2012 at 4:07 PM, Dan Scales  wrote:
> The advantage of putting the checksum calculation in smgrwrite() (or 
> mdwrite()) is that it catches a bunch of page writes that don't go through 
> the buffer pool (see calls to smgrwrite() in nbtree.c, nbtsort.c, spginsert.c)

Maybe we should have some sort of wrapper function, then.  I really
dislike the idea that the smgr layer knows anything about the page
format, and if md has to know that's even worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-30 Thread Robert Haas

On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes  wrote:
>> I think that even in normal (non-initialization) usage, this message
>> should be suppressed when the provided scale factor
>> is equal to the pgbench_branches table count.
>
> The attached patch does just that.  There is probably no reason to
> warn people that we are doing what they told us to, but not for the
> reason they think.

In my opinion, a more sensible approach than anything we're doing
right now would be to outright *reject* options that will only be
ignored.  If -s isn't supported except with -i, then trying to specify
-s without -i should just error out at the options-parsing stage,
before we even try to connect to the database.  It's not very helpful
to accept options and then ignore them, and we have many instances of
that right now: initialization-only switches are accepted and ignored
when not initializing, and run-only switches are accepted and ignored
with initializing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Configuring Postgres to Add A New Source File

2012-01-30 Thread Robert Haas

On Fri, Jan 27, 2012 at 6:37 PM, Tareq Aljabban
 wrote:
> Indeed, I'm a beginner in "Make", but I read few tutorials and was able to
> do what I wanted outside of PG using a simple make file.
> Now, when moving to PG, I found the Make structure much more complicated and
> didn't know where to add my configuration.
> I'm looking only for this file to run in PG (the required functionality is
> done already). My code will be a part of the backend, so I want to keep it
> there.

Uh, no it won't.  The compile command you showed had it creating a
standalone executable, hdfs_test.

You're likely to find that linking a JVM into the backend
significantly decreases the overall robustness of the system.  For
example, ^C isn't going to be able to abort out of processing a query
that's off doing something inside the JVM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multithread Query Planner

2012-01-30 Thread Robert Haas

On Fri, Jan 27, 2012 at 2:56 PM, Pierre C  wrote:
>> Right.  I think it makes more sense to try to get parallelism working
>> first with the infrastructure we have.  Converting to use threading,
>> if we ever do it at all, should be something we view as a later
>> performance optimization.  But I suspect we won't want to do it
>> anyway; I think there will be easier ways to get where we want to be.
>
> Multithreading got fashionable with the arrival of the Dual-core CPU a few
> years ago. However, multithreading as it is used currently has a huge
> problem : usually, threads share all of their memory. This opens the door to
> an infinite number of hard to find bugs, and more importantly, defeats the
> purpose.
>
> "Re-entrant palloc()" is nonsense. Suppose you can make a reentrant palloc()
> which scales OK at 2 threads thanks to a cleverly placed atomic instruction.
> How is it going to scale on 64 cores ? On HP's new 1000-core ARM server with
> non-uniform memory access ? Probably it would suck very very badly... not to
> mention the horror of multithreaded exception-safe deallocation when 1
> thread among many blows up on an error...

There are academic papers out there on how to build a thread-safe,
highly concurrent memory allocator.  You seem to be assuming that
everyone doing allocations would need to compete for access to a
single freelist, or something like that, which is simply not true.  A
lot of effort and study has been put into figuring out how to get past
bottlenecks in this area, because there is a lot of multi-threaded
code out there that needs to surmount these problems.  I don't believe
that the problem is that it can't be done, but rather that we haven't
done it.

> For the ultimate in parallelism, ask a FPGA guy. Is he using shared memory
> to wire together his 12000 DSP blocks ? Nope, he's using isolated Processes
> which share nothing and communicate through FIFOs and hardware message
> passing. Like shell pipes, basically. Or Erlang.

I'm not sure we can conclude much from this example.  The programming
style of people using FPGAs is probably governed by the nature of the
interface and the type of computation they are doing rather than
anything else.

> Good parallelism = reduce shared state and communicate through data/message
> channels.
>
> Shared-everything multithreading is going to be in a lot of trouble on
> future many-core machines. Incidentally, Postgres, with its Processes,
> sharing only what is needed, has a good head start...
>
> With more and more cores coming, you guys are going to have to fight to
> reduce the quantity of shared state between processes, not augment it by
> using shared memory threads !...

I do agree that it's important to reduce shared state.  We've seen
some optimizations this release cycle that work precisely because they
cut down on the rate at which cache lines must be passed between
cores, and it's pretty clear that we need to go farther in that
direction.  On the other hand, I think it's a mistake to confuse the
programming model with the amount of shared state.  In a
multi-threaded programming model there is likely to be a lot more
memory that is technically "shared" in the sense that any thread could
technically access it.  But if the application is coded in such a way
that actual sharing is minimal, then it's not necessarily any worse
than a process model as far as concurrency is concerned.  Threading
provides a couple of key advantages which, with our process model, we
can't get: it avoids the cost of a copy-on-write operation every time
a child is forked, and it allows arbitrary amounts of memory rather
than being limited to a single shared memory segment that must be
sized in advance.  The major disadvantage is really with robustness,
not performance, I think: in a threaded environment, with a shared
address space, the consequences of a random memory stomp will be less
predictable.

> Say you want to parallelize sorting.
> Sorting is a black-box with one input data pipe and one output data pipe.
> Data pipes are good for parallelism, just like FIFOs. FPGA guys love black
> boxes with FIFOs between them.
>
> Say you manage to send tuples through a FIFO like zeromq. Now you can even
> run the sort on another machine and allow it to use all the RAM if you like.
> Now split the black box in two black boxes (qsort and merge), instanciate as
> many qsort boxes as necessary, and connect that together with pipes. Run
> some boxes on some of this machine's cores, some other boxes on another
> machine, etc. That would be very flexible (and scalable).
>
> Of course the black box has a small backdoor : some comparison functions can
> access shared state, which is basically *the* issue (not reentrant stuff,
> which you do not need).

Well, you do need reentrant stuff, if the comparator does anything
non-trivial.  It's easy to imagine that comparing strings or dates or
whatever is a trivial operation that's done without allocating any
memory or throw

Re: [HACKERS] Group commit, revised

2012-01-30 Thread Heikki Linnakangas


On 27.01.2012 15:38, Robert Haas wrote:

On Fri, Jan 27, 2012 at 8:35 AM, Heikki Linnakangas
  wrote:

Yeah, we have to be careful with any overhead in there, it can be a hot
spot. I wouldn't expect any measurable difference from the above, though.
Could I ask you to rerun the pgbench tests you did recently with this patch?
Or can you think of a better test for this?


I can't do so immediately, because I'm waiting for Nate Boley to tell
me I can go ahead and start testing on that machine again.  But I will
do it once I get the word.


I committed this. I ran pgbench test on an 8-core box and didn't see any 
slowdown. It would still be good if you get a chance to rerun the bigger 
test, but I feel confident that there's no measurable slowdown.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-30 Thread Abhijit Menon-Sen

At 2012-01-27 09:47:05 +0530, a...@toroid.org wrote:
>
> I've started reviewing this patch, but it'll take me a bit longer to go
> through json.c properly.

OK, I finished reading json.c. I don't have an answer to the detoasting
question in the XXX comment, but the code looks fine.

Aside: is query_to_json really necessary? It seems rather ugly and
easily avoidable using row_to_json.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hot standby off of hot standby?

2012-01-30 Thread Igor Schtein

Hi,

Is it possible to use a standby instance as a master/primary for another
standby in Postgres 9.0?  In other words, does PG 9.0 supports cascading
standby configuration?

Thanks,
Igor

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-30 Thread horiguchi . kyotaro

I'm sorry.

> Thank you for comments, this is revised version of the patch.

The malloc error handling in dblink.c of the patch is broken. It
is leaking memory and breaking control.

i'll re-send the properly fixed patch for dblink.c later.

# This severe back pain should have made me stupid :-p

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS]

2012-01-30 Thread horiguchi . kyotaro

I'm sorry.

> Thank you for comments, this is revised version of the patch.

The malloc error handling in dblink.c of the patch is broken. It
is leaking memory and breaking control.

i'll re-send the properly fixed patch for dblink.c later.

# This severe back pain should have made me stupid :-p

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Confusing EXPLAIN output in case of inherited tables

2012-01-30 Thread Ashutosh Bapat

Thanks Tom for giving a stronger case. I found the problem whille looking
at inherited tables, and didn't think beyond inherited tables. Since
inherited tables are expanded when subquery planner is invoked, I thought
the problem will occur only in Explain output as we won't generate queries,
that can be used elsewhere after/during planning.

So, as I understand we have two problems here
1. Prefixing schemaname to the fake alises if there is another RTE with
same name. There may not be a relation with that name (fake alias name
given) in the schema chosen as prefix.
2. Fake aliases themselves can be conflicting.

If I understand correctly, if we solve the second problem, first problem
will not occur. Is that correct?

On Sat, Jan 28, 2012 at 8:08 AM, Tom Lane  wrote:

> Robert Haas  writes:
> > It's a feature, not a bug, that we schema-qualify names when VERBOSE
> > is specified.  That was done on purpose for the benefit of external
> > tools that might need this information to disambiguate which object is
> > being referenced.
>
> > Table *aliases*, of course, should not be schema-qualified, but I
> > don't think that's what we're doing.  You could make it more clear by
> > including an alias in the query, like this:
>
> > explain verbose select * into table ramp from road hwy where name ~
> '.*Ramp';
>
> I think you are both focusing on the wrong thing.  There is a lot of
> squishiness in what EXPLAIN prints out, since SQL notation is not always
> well suited to what an execution plan actually does.  But this code has
> a hard and fast requirement that it dump view definitions correctly,
> else pg_dump doesn't work.  And after looking at this I think Ashutosh
> has in fact found a bug.  Consider this example:
>
> regression=# create schema s1;
> CREATE SCHEMA
> regression=# create schema s2;
> CREATE SCHEMA
> regression=# create table s1.t1 (f1 int);
> CREATE TABLE
> regression=# create table s2.t1 (f1 int);
> CREATE TABLE
> regression=# create view v1 as
> regression-#   select * from s1.t1 where exists (
> regression(# select 1 from s2.t1 where s2.t1.f1 = s1.t1.f1
> regression(#   );
> CREATE VIEW
> regression=# \d+ v1
>   View "public.v1"
>  Column |  Type   | Modifiers | Storage | Description
> +-+---+-+-
>  f1 | integer |   | plain   |
> View definition:
>  SELECT t1.f1
>   FROM s1.t1
>  WHERE (EXISTS ( SELECT 1
>   FROM s2.t1
>  WHERE t1.f1 = s1.t1.f1));
>
> regression=# alter table s2.t1 rename to tx;
> ALTER TABLE
> regression=# \d+ v1
>   View "public.v1"
>  Column |  Type   | Modifiers | Storage | Description
> +-+---+-+-
>  f1 | integer |   | plain   |
> View definition:
>  SELECT t1.f1
>   FROM s1.t1
>  WHERE (EXISTS ( SELECT 1
>   FROM s2.tx t1
>  WHERE t1.f1 = s1.t1.f1));
>
> Both of the above displays of the view are formally correct, in that the
> variables will be taken to refer to the correct upper or lower RTE.
> But let's change that back and rename the other table:
>
> regression=# alter table s2.tx rename to t1;
> ALTER TABLE
> regression=# alter table s1.t1 rename to tx;
> ALTER TABLE
> regression=# \d+ v1
>   View "public.v1"
>  Column |  Type   | Modifiers | Storage | Description
> +-+---+-+-
>  f1 | integer |   | plain   |
> View definition:
>  SELECT t1.f1
>   FROM s1.tx t1
>  WHERE (EXISTS ( SELECT 1
>   FROM s2.t1
>  WHERE t1.f1 = s1.t1.f1));
>
> This is just plain wrong, as you'll see if you try to execute that
> query:
>
> regression=# SELECT t1.f1
> regression-#FROM s1.tx t1
> regression-#   WHERE (EXISTS ( SELECT 1
> regression(#FROM s2.t1
> regression(#   WHERE t1.f1 = s1.t1.f1));
> ERROR:  invalid reference to FROM-clause entry for table "t1"
> LINE 5:   WHERE t1.f1 = s1.t1.f1));
>^
> HINT:  There is an entry for table "t1", but it cannot be referenced
> from this part of the query.
>
> (The HINT is a bit confused here, but the query is certainly invalid.)
>
> So what we have here is a potential failure to dump and reload view
> definitions, which is a lot more critical in my book than whether
> EXPLAIN's output is confusing.
>
> If we stick with the existing rule for attaching a fake alias to renamed
> RTEs, I think that Ashutosh's patch or something like it is probably
> appropriate, because the variable-printing code ought to be in step with
> the RTE-printing code.  Unfortunately, I think the hack to attach a fake
> alias to renamed RTEs creates some issues of its own.  Consider
>
>select * from s1.t1
>  where exists (select 1 from s2.t2 t1 where t1.f1 = s1.t1.f1);
>
> If s1.t1 is now renamed to s1.tx, it is still possible to express
> the same semantics:
>
>select * from s1.tx
>  where exists (select 1 from s

Re: [HACKERS] Remembering bug #6123

2012-01-30 Thread Simon Riggs

On Thu, Jan 12, 2012 at 4:30 PM, Tom Lane  wrote:
> "Kevin Grittner"  writes:
>> Tom Lane  wrote:
>>> Also, what's the point of testing update_ctid?  I don't see that
>>> it matters whether the outdate was a delete or an update.
>
>> The update_ctid code was a carry-over from my old, slightly
>> different approach, which I failed to change as I should have.  I'll
>> fix that along with the other.
>
> Actually, on reflection there might be a reason for checking
> update_ctid, with a view to allowing "harmless" cases.  I see
> these cases:
>
> * UPDATE finds a trigger already updated the row: must throw error
> since we can't apply the update.
>
> * UPDATE finds a trigger already deleted the row: arguably, we could
> let the deletion stand and ignore the update action.
>
> * DELETE finds a trigger already updated the row: must throw error
> since we can't apply the delete.
>
> * DELETE finds a trigger already deleted the row: arguably, there's
> no reason to complain.
>
> Don't know if that was your reasoning as well.  But if it is, then again
> the comment needs to cover that.


Just been reading this thread a little.

ISTM that seeing a SelfUpdated row on a cursor when we didn't use
WHERE CURRENT OF really ought to raise some kind of exception
condition like a WARNING or a NOTICE. That situation seems most likely
to be a bug or at least an unintended consequence.

As Tom points out, the multiple flavours of weirdness that can result
even if we do fix this are not things we should do silently. I think
we should do the best job we can without throwing an error, but we
must make some attempt to let the developer know we did that for them
so they can investigate.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby off of hot standby?

2012-01-30 Thread Simon Riggs

On Mon, Jan 30, 2012 at 4:36 AM, Igor Schtein  wrote:

> Is it possible to use a standby instance as a master/primary for another
> standby in Postgres 9.0?  In other words, does PG 9.0 supports cascading
> standby configuration?

No, but 9.2 will support that feature, known as cascading replication.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-30 Thread Kyotaro HORIGUCHI

Thank you for comments, this is revised version of the patch.

The gain of performance is more than expected. Measure script now
does query via dblink ten times for stability of measuring, so
the figures become about ten times longer than the previous ones.

   sec% to Original
Original : 31.5 100.0%
RowProcessor patch   : 31.3  99.4%
dblink patch : 24.6  78.1%

RowProcessor patch alone makes no loss or very-little gain, and
full patch gives us 22% gain for the benchmark(*1).


The modifications are listed below.


- No more use of PGresAttValue for this mechanism, and added
  PGrowValue instead. PGresAttValue has been put back to
  libpq-int.h

- pqAddTuple() is restored as original and new function
  paAddRow() to use as RowProcessor. (Previous pqAddTuple
  implement had been buggily mixed the two usage of
  PGresAttValue)

- PQgetRowProcessorParam has been dropped. Contextual parameter
  is passed as one of the parameters of RowProcessor().

- RowProcessor() returns int (as bool, is that libpq convension?)
  instead of void *. (Actually, void * had already become useless
  as of previous patch)

- PQsetRowProcessorErrMes() is changed to do strdup internally.

- The callers of RowProcessor() no more set null_field to
  PGrowValue.value. Plus, the PGrowValue[] which RowProcessor()
  receives has nfields + 1 elements to be able to make rough
  estimate by cols->value[nfields].value - cols->value[0].value -
  something.  The somthing here is 4 * nfields for protocol3 and
  4 * (non-null fields) for protocol2. I fear that this applies
  only for textual transfer usage...

- PQregisterRowProcessor() sets the default handler when given
  NULL. (pg_conn|pg_result).rowProcessor cannot be NULL for its
  lifetime.

- initStoreInfo() and storeHandler() has been provided with
  malloc error handling.


And more..

- getAnotherTuple()@fe-protocol2.c is not tested utterly.

- The uniformity of the size of columns in the test data prevents
  realloc from execution in dblink... More test should be done.


 regards,

=
(*1) The benchmark is done as follows,

==test.sql
select dblink_connect('c', 'host=localhost dbname=test');
select * from dblink('c', 'select a,c from foo limit 200') as (a text b 
bytea) limit 1;
...(repeat 9 times more)
select dblink_disconnect('c');
==

$ for i in $(seq 1 10); do time psql test -f t.sql; done

The environment is
  CentOS 6.2 on VirtualBox on Core i7 965 3.2GHz
  # of processor  1
  Allocated mem   2GB
  
Test DB schema is
   Column | Type  | Modifiers 
  +---+---
   a  | text  | 
   b  | text  | 
   c  | bytea | 
  Indexes:
  "foo_a_bt" btree (a)
  "foo_c_bt" btree (c)

test=# select count(*),
   min(length(a)) as a_min, max(length(a)) as a_max,
   min(length(c)) as c_min, max(length(c)) as c_max from foo;

  count  | a_min | a_max | c_min | c_max 
-+---+---+---+---
 200 |29 |29 |29 |29
(1 row)

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..5ed083c 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,5 @@ PQconnectStartParams  157
 PQping158
 PQpingParams  159
 PQlibVersion  160
+PQregisterRowProcessor	  161
+PQsetRowProcessorErrMes	  162
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index d454538..4fe2f41 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2692,6 +2692,8 @@ makeEmptyPGconn(void)
 	conn->allow_ssl_try = true;
 	conn->wait_ssl_try = false;
 #endif
+	conn->rowProcessor = pqAddRow;
+	conn->rowProcessorParam = NULL;
 
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
@@ -5076,3 +5078,10 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
+void
+PQregisterRowProcessor(PGconn *conn, RowProcessor func, void *param)
+{
+	conn->rowProcessor = (func ? func : pqAddRow);
+	conn->rowProcessorParam = param;
+}
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..82914fd 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
+static int	pqAddTuple(PGresult *res, PGresAttValue *tup);
 
 
 /* 
@@ -160,6 +161,9 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result->curBlock = NULL;
 	result->curOffset = 0;
 	result->spaceLeft = 0;
+	result->rowProcessor = pqAddRow;
+	result->rowProcessorParam = NULL;
+	result->rowProcessorErrMes = NULL;
 
 	if (conn)
 	{
@@ -194,6 +198,10 @@ PQmakeEmptyP

Re: [HACKERS] Patch: Allow SQL-language functions to reference parameters by parameter name

2012-01-30 Thread Hitoshi Harada

On Sun, Jan 29, 2012 at 1:08 AM, Matthew Draper  wrote:
> On 25/01/12 18:37, Hitoshi Harada wrote:
>>> I'm still not sure whether to just revise (almost) all the SQL function
>>> examples to use parameter names, and declare them the "right" choice; as
>>> it's currently written, named parameters still seem rather second-class.
>>
>> Agreed.
>
> I'll try a more comprehensive revision of the examples.
>
>> The patch seems ok, except an example I've just found.
>>
>> db1=# create function t(a int, t t) returns int as $$ select t.a $$
>> language sql;
>> CREATE FUNCTION
>> db1=# select t(0, row(1, 2)::t);
>>  t
>> ---
>>  1
>> (1 row)
>>
>> Should we throw an error in such ambiguity? Or did you make it happen
>> intentionally? If latter, we should also mention the rule in the
>> manual.
>
>
> I did consider it, and felt it was the most consistent:
>
> # select t.x, t, z from (select 1) t(x), (select 2) z(t);
>  x | t |  z
> ---+---+-
>  1 | 2 | (2)
> (1 row)
>
>
> I haven't yet managed to find the above behaviour described in the
> documentation either, though. To me, it feels like an obscure corner
> case, whose description would leave the rules seeming more complicated
> than they generally are.

Makes sense to me. I marked this as Ready for committer.

Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

51 matches

Mail list logo