Re: [HACKERS] GiST range-contained-by searches versus empty ranges

2011-11-26 Thread Tom Lane
I wrote:
> I started to wonder why the test in range_gist_consistent_int() for
> RANGESTRAT_CONTAINED_BY was "return true" (ie, search the entire index)
> rather than range_overlaps, which is what is tested in the comparable
> case in rtree_internal_consistent().  The regression tests showed me
> how come: an empty-range index entry should be reported as being
> contained by anything, but range_overlaps won't necessarily find such
> entries.  (The rtree case is all right because there is no equivalent of
> an empty range in boxes, circles, or polygons.)

> I am not satisfied with this state of affairs though: range_contained_by
> really ought to be efficiently indexable, but with the current coding
> an index search is nearly useless.  Also, so far as I can see, the
> current penalty function allows empty-range entries to be scattered
> basically anywhere in the index, making a search for "= empty" pretty
> inefficient too.

> The first solution that comes to mind is to make the penalty and
> picksplit functions forcibly segregate empty ranges from others, that is
> a split will never put empty ranges together with non-empty ones.  Then,
> we can assume that a non-empty internal node doesn't represent any empty
> leaf entries, and avoid descending to it when it doesn't overlap the
> target range.  Then the equality-search case could be improved too.

After looking through the code a bit, it seems that this isn't possible
to solve with a localized fix after all.  Yes, the opclass-specific
picksplit function can positively guarantee to separate empty ranges
from others when it splits a page ... but it doesn't have control over
what happens when an item is added to an index without a page split
occurring.  Consider a scenario where we have a multi-page GiST index
containing no empty ranges, and an empty range item needs to be added.
As the code is currently written, the central GiST code will choose one
of the existing root-page items to add the new item to; there is no way
to persuade it that a new top-level item would be a better idea.  What's
more, if we try to do so by having the penalty function return infinity
for adding an empty range to a nonempty item, we'll end up with empty
ranges being randomly added to any/all of the items, since we'll get the
exact same penalty for all the nonempty items.  So, far from segregating
the empties, we'll just end up with them cluttered all over the index,
just like now.

This is not really a new problem.  There is code in there that is trying
to segregate NULLs from non-NULL entries, and it is equally
broken/ineffective.

I'm inclined to propose that we should add some logic to say that
merging a new item into an existing one is forbidden if the penalty
function returns plus-infinity for the case.  If all existing items on a
page return infinity, a new item must be added to the page (possibly
causing a page split) instead of inserting into any existing one.
(Of course, gistpenalty() should be fixed to return infinity, not just a
randomly chosen large value, for the null-and-not-null case.)

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Kevin Grittner
Andres Freund  wrote:
 
> All current x86 cpus use 64bytes.
 
That matches what I found in recent research on the topic.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Andres Freund
On Saturday, November 26, 2011 11:39:23 PM Robert Haas wrote:
> On Sat, Nov 26, 2011 at 5:28 PM, Andres Freund  wrote:
> > On Saturday, November 26, 2011 09:52:17 PM Tom Lane wrote:
> >> I'd just as soon keep the fields in a logical order.
> > 
> > Btw, I don't think the new order is necessarily worse than the old one.
> 
> You forget to attach the benchmark results.
> 
> My impression is that cache lines on modern hardware are 64 or 128
> *bytes*, in which case this wouldn't matter a bit.
> 
> But testing is even better than guessing.
Being prodded like that I ran a very quick benchmark on my workstation. 
Unfortunately that means I cannot work during the time which is why I kept it 
rather short...

That machine has 2 E5520@2.27GHz which means 2(sockets) * 4(cores) * 
2(threads) and 24GB of ram.

Data was initialized with: pgbench -h /tmp/ --unlogged-tables -i -s 20 pgbench


pgbench -h /tmp/ pgbench -S -j 16 -c 16 -T 60

origsnap:   92825.743958 93145.110901 93389.915474 93175.482351
reordersnap:93560.183272 93613.333494 93495.263012 93523.368489

pgbench -h /tmp/ pgbench -S -j 32 -c 32 -T 60

origsnap:   81846.743329 81545.175672 81702.755226 81576.576435 
81228.154119 81546.047708 81421.436262
reordersnap:81823.479196 81787.784508 81820.242145 81790.263415 
81762.421592 81496.333144 81732.088876

At that point I noticed I had accidentally run with a nearly stock config... 
An even shorter run with a more approrioate config yielded:


pgbench -h /tmp/ pgbench -S -j 32 -c 32 -T 20

origsnap:   102234.664020 102003.449741 102119.509053 101722.410387 
101973.651318 102056.440561
reordersnap:103444.877879 103385.08 103302.318923 103372.659486 
103330.157612 103313.833821



Looks like a win to me. Even on this comparatively small machine.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Andres Freund
On Saturday, November 26, 2011 11:39:23 PM Robert Haas wrote:
> On Sat, Nov 26, 2011 at 5:28 PM, Andres Freund  wrote:
> > On Saturday, November 26, 2011 09:52:17 PM Tom Lane wrote:
> >> I'd just as soon keep the fields in a logical order.
> > 
> > Btw, I don't think the new order is necessarily worse than the old one.
> 
> You forget to attach the benchmark results.
> 
> My impression is that cache lines on modern hardware are 64 or 128
> *bytes*, in which case this wouldn't matter a bit.
All current x86 cpus use 64bytes. The 2nd 128bit reference was a typo. Sorry 
for that.
And why is 72=>56 *bytes* (I even got that one right) not relevant for 64byte 
cachelines?

And yea. I didn't add benchmark results. I don't think I *have* to do that 
when making suggestions to somebody trying to improve something specific. I 
also currently don't have hardware where I can sensibly run at a high enough 
concurrency to see that GetSnapshotData takes ~40% of runtime.
Additional cacheline references around synchronized access can hurt to my 
knowledge...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Robert Haas
On Sat, Nov 26, 2011 at 5:28 PM, Andres Freund  wrote:
> On Saturday, November 26, 2011 09:52:17 PM Tom Lane wrote:
>> I'd just as soon keep the fields in a logical order.
> Btw, I don't think the new order is necessarily worse than the old one.

You forget to attach the benchmark results.

My impression is that cache lines on modern hardware are 64 or 128
*bytes*, in which case this wouldn't matter a bit.

But testing is even better than guessing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Andres Freund
On Saturday, November 26, 2011 09:52:17 PM Tom Lane wrote:
> I'd just as soon keep the fields in a logical order.
Btw, I don't think the new order is necessarily worse than the old one.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Andres Freund
On Saturday, November 26, 2011 09:52:17 PM Tom Lane wrote:
> Andres Freund  writes:
> > You could also try if it makes a difference reducing SnapshotData to one
> > instead of two cachelines. The data itself fits into one without
> > problems. Trivial patch attached.
> On what grounds do you argue that this patch gets SnapshotData into one
> cacheline today (and on what hardware), or that it will continue to do
> so in the future?  If we're this close to the edge then any future
> addition to the struct will destroy the point.  I'd just as soon keep
> the fields in a logical order.
To my knowledge there is no current and supported (i.e. I don't count Itanium) 
hardware that doesn't have 64bit cachelines except in the embedded market.
I also don't see a movement towards 128bit cpus or anything else requiring 
bigger pointers.
If platforms with 128bit cachelines or such would gain popularity - nothing 
would be lost by that change.
Which change are you "afraid" of?

Now the holes I talked about obviously were on a 64bit machine. But there are 
a few situations where improving layout so it fits with 8byte alignment for 
pointers makes the situation worse for 32bit. Looking a bit careful at the 
changes one does should catch those problems.

Also its not *that* close to overflowing again. It was 72 bytes before, now 
its 56 on a 64bit x86 machine with linux abi.

I agree that logical order is important. I don't propose changing all structs 
in pg mechanically.+

Its sad that there is no sensible method to let the compiler safely reorder 
struct members accross compiler (-versions) and platforms...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PostgreSQL fails to build with 32bit MinGW-w64

2011-11-26 Thread Andrew Dunstan



On 11/24/2011 06:29 AM, Lars Kanis wrote:

Isn't it better to check the value of macros itsef rather than checking for
system dependent macros that does not directly relate to the issue?
specifically for getaddrinfo.c case I think
#if EAI_NODATA != EAI_NONAME
is a better check than checking for
#if !defined(__MINGW64_VERSION_MAJOR)&&  !defined(WIN32_ONLY_COMPILER)  /* 
MSVC/WIN64 duplicate */

Yes it's better and it works for all described test environments.


For the win32.h, I really don't understand why _WINSOCKAPI_ was defined before

some google suggests that defining _WINSOCKAPI_ before  prevents
inclusion of winsock.h but that does not have relation to inclusion of
  and if  is included first, it should be ok.

If this guess is right, perhaps it could be better to remove the three lines.
#if !defined(WIN64) || defined(WIN32_ONLY_COMPILER)
#define _WINSOCKAPI_
#endif




No, this broke some compilers, IIRC (probably the native mingw compiler, 
which is in use by several buildfarm members). Getting this right was 
very tricky and time-consuming when I was adding support for the 64 bit 
mingw-w64 compiler, and there were a couple of rounds of breakage.


I'm therefore much more inclined to go the way of your earlier patch, 
which seems much less risky.




I only changed this for consistency. For me, it works without that define in 
all test
environments, too.


+/* __MINGW64_VERSION_MAJOR is related to both 32/64 bit gcc compiles by
+ * mingw-w64, however it gots defined only after
Why not use __MINGW32__, which is defined without including any headers?



Because it's defined by other than mingw-w64 compilers.


We have a bunch of compilers to support here. There are LOTS of compiler 
scenarios on Windows (several versions of MSVC, 32bit and 64bit 
mingw-w64, native mingw gcc, and a couple of Cygwin based compilers), 
and keeping track of them all and making sure they don't break can be a 
pain.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Tom Lane
Andres Freund  writes:
> You could also try if it makes a difference reducing SnapshotData to one 
> instead of two cachelines. The data itself fits into one without problems.
> Trivial patch attached.

On what grounds do you argue that this patch gets SnapshotData into one
cacheline today (and on what hardware), or that it will continue to do
so in the future?  If we're this close to the edge then any future
addition to the struct will destroy the point.  I'd just as soon keep
the fields in a logical order.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Andres Freund
Hi,

On Saturday, November 26, 2011 04:52:50 PM Pavan Deolasee wrote:
> I think now that we have reduced the run time of the function itself,
> we should now try to reduce the number of times the function is
> called. Robert proposed a way to reduce the number of calls per
> transaction. I think we can go one more step further and reduce the
> number for across the transactions.
You could also try if it makes a difference reducing SnapshotData to one 
instead of two cachelines. The data itself fits into one without problems.
Trivial patch attached.

Generally I think we should check that for most of the more commonly used 
strutures, we have many with too much padding.

Andres
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 859e52a..b9c2209 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -46,13 +46,14 @@ typedef struct SnapshotData
 	 */
 	TransactionId xmin;			/* all XID < xmin are visible to me */
 	TransactionId xmax;			/* all XID >= xmax are invisible to me */
-	uint32		xcnt;			/* # of xact ids in xip[] */
 	TransactionId *xip;			/* array of xact IDs in progress */
+	uint32		xcnt;			/* # of xact ids in xip[] */
 	/* note: all ids in xip[] satisfy xmin <= xip[i] < xmax */
 	int32		subxcnt;		/* # of xact ids in subxip[] */
 	TransactionId *subxip;		/* array of subxact IDs in progress */
 	bool		suboverflowed;	/* has the subxip array overflowed? */
 	bool		takenDuringRecovery;	/* recovery-shaped snapshot? */
+	bool		copied;			/* false if it's a static snapshot */
 
 	/*
 	 * note: all ids in subxip[] are >= xmin, but we don't bother filtering
@@ -61,7 +62,6 @@ typedef struct SnapshotData
 	CommandId	curcid;			/* in my xact, CID < curcid are visible */
 	uint32		active_count;	/* refcount on ActiveSnapshot stack */
 	uint32		regd_count;		/* refcount on RegisteredSnapshotList */
-	bool		copied;			/* false if it's a static snapshot */
 } SnapshotData;
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql line number reporting from stdin

2011-11-26 Thread Peter Eisentraut
There is a long-standing oddity in psql that running

psql -f foo.sql

returns error messages with file name and line number, like

psql:foo.sql:1: ERROR:  syntax error at or near "foo"

but running

psql < foo.sql does not.  I suggest we change the latter to print

psql::1: ERROR:  syntax error at or near "foo"

Other examples for the use of the spelling "" in this context
include gcc and slonik.

Error messages printed in interactive mode will not be affected, of
course.

Patch attached.
diff --git i/src/bin/psql/common.c w/src/bin/psql/common.c
index 5ab736e..18260f1 100644
--- i/src/bin/psql/common.c
+++ w/src/bin/psql/common.c
@@ -166,6 +166,8 @@ psql_error(const char *fmt,...)
 
 	if (pset.inputfile)
 		fprintf(stderr, "%s:%s:" UINT64_FORMAT ": ", pset.progname, pset.inputfile, pset.lineno);
+	else if (!isatty(fileno(stdin)))
+		fprintf(stderr, "%s::" UINT64_FORMAT ": ", pset.progname, pset.lineno);
 	va_start(ap, fmt);
 	vfprintf(stderr, _(fmt), ap);
 	va_end(ap);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Perl xsubpp

2011-11-26 Thread Andrew Dunstan



On 10/12/2011 08:55 PM, Alex Hunsaker wrote:

On Wed, Oct 12, 2011 at 17:53, David E. Wheeler  wrote:

On Sep 15, 2011, at 3:04 PM, Alex Hunsaker wrote:


Close, seems I was wrong about the typemap ExtUtils::ParseXS does not
install a new one so we still need to point to the one in privlib.
Also xsubpp is not executable so the test should be -r or something.

Also don't think we should change the configure switch tests to test XSUBPPDIR.

Find those plus some minor typos fixed in the attached.

--

Doesn't look like this has been applied yet. I think it ought to be backported, 
too, frankly. DId I miss it?

Nah, probably should add it to the next commit fest so it does not get
forgotten.



committed.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on implementing URI syntax for libpq

2011-11-26 Thread Alexander Shulgin

Excerpts from Robert Haas's message of Thu Nov 24 13:57:17 +0200 2011:
> 
> I think it would be really weird not to support user:pw@host:port.  You can 
> presumably also support the JDBC style for backward compatibility, but I 
> don't think we should adopt that syntax as project standard.

By the way, if we're already considering this, what about special syntax for 
SSL, instead of the JDBC's "&ssl=true" thingy?

Given that the default sslmode is "prefer" I assume libpq tries SSL first, then 
falls back to plain text if that's not available.

To me, it looks much more natural if the fact that SSL is/should be used is 
stated early in the URI syntax, like: "https://";, "svn+ssh://", etc., rather 
than in the query parameters (if the parameters were to be passed to remote 
service to process, like it's done with HTTP[S], this would not make any sense 
at all.)

But given that sslmode can currently be either of: "disable", "allow", 
"prefer", "require", "verify-ca" or "verify-full" (and who knows if any new 
allowed mode could show up later,) allowing "&sslmode=whatever" makes sense.  
Note, that this is not the same as "&ssl=whatever".

So how about this:

  postgresql:ssl://user:pw@host:port/dbname?sslmode=...

The "postgresql:ssl://" designator would assume "sslmode=require", if not 
overriden in extra parameters and "postgresql://" would imply "sslmode=prefer". 
 And to disable SSL you would pick either designator and append 
"sslmode=disable".

The JDBC's "ssl=true" will translate to "sslmode=require".

If we can decide on this, we should also put reasonable effort into making JDBC 
support the same syntax.

Thoughts?

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-11-26 Thread Dimitri Fontaine
Hi,

Robert Haas  writes:
> On Sun, Nov 13, 2011 at 8:57 PM, Robert Haas  wrote:
>> In the -M extended case, we take a snapshot from exec_parse_message(),
>> and the same two in the exec_bind_message() call that are taken in the
>> -M prepared case.  So reducing the prepared case from two snapshots to
>> one will reduce the extended case from three snapshots to two, thus
>> saving one snapshot per query regardless of how it's executed.

I like the idea and I think it's better semantics to use the same
snapshot for planning and executing in the simple query case.

I didn't try to reproduce the performance benefits seen by Robert here,
nor did I tried to double check to compilation warnings etc.  I guess
that reviewing a commiter's patch allows for being not as thorough :)

> + /* Done with the snapshot used for parameter I/O and parsing/planning */
> + if (snapshot_set)
> + PopActiveSnapshot();

This comment needs adjusting.

> diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
> index 466727b..c41272b 100644
> --- a/src/backend/tcop/pquery.c
> +++ b/src/backend/tcop/pquery.c
> @@ -455,7 +455,7 @@ FetchStatementTargetList(Node *stmt)
>   * tupdesc (if any) is known.
>   */
>  void
> -PortalStart(Portal portal, ParamListInfo params, Snapshot snapshot)
> +PortalStart(Portal portal, ParamListInfo params, bool use_active_snapshot)

You need to be editing the comments for this function.  To be specific
you didn't update this text:

 * The caller can optionally pass a snapshot to be used; pass InvalidSnapshot
 * for the normal behavior of setting a new snapshot.  This parameter is
 * presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
 * to be used for cursors).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on implementing URI syntax for libpq

2011-11-26 Thread Alexander Shulgin

Excerpts from Robert Haas's message of Thu Nov 24 15:59:08 +0200 2011:
> 
> I think we could do something like:
> 
> postgresql://user:pw@host:port/database?param1=val1¶m2=val2¶m3=val3&...

I wonder if this should be allowed syntax (i.e. specify a user, but connect 
locally, so leave 'host' to be an empty string):

  postgresql://user@/

Furthermore, if we want to connect locally, but to a non-default port:

  postgresql://user@:5433/

I would also think that if one is to specify the password in the URI, and the 
password happen to contain the @-sign (e.g. "!@#$%^",) it should be 
percent-encoded, like:

  postgresql://user:!%40#$%^@/

Reasonable?

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GiST range-contained-by searches versus empty ranges

2011-11-26 Thread Tom Lane
I started to wonder why the test in range_gist_consistent_int() for
RANGESTRAT_CONTAINED_BY was "return true" (ie, search the entire index)
rather than range_overlaps, which is what is tested in the comparable
case in rtree_internal_consistent().  The regression tests showed me
how come: an empty-range index entry should be reported as being
contained by anything, but range_overlaps won't necessarily find such
entries.  (The rtree case is all right because there is no equivalent of
an empty range in boxes, circles, or polygons.)

I am not satisfied with this state of affairs though: range_contained_by
really ought to be efficiently indexable, but with the current coding
an index search is nearly useless.  Also, so far as I can see, the
current penalty function allows empty-range entries to be scattered
basically anywhere in the index, making a search for "= empty" pretty
inefficient too.

The first solution that comes to mind is to make the penalty and
picksplit functions forcibly segregate empty ranges from others, that is
a split will never put empty ranges together with non-empty ones.  Then,
we can assume that a non-empty internal node doesn't represent any empty
leaf entries, and avoid descending to it when it doesn't overlap the
target range.  Then the equality-search case could be improved too.

Thoughts, better ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prep object creation hooks, and related sepgsql updates

2011-11-26 Thread Dimitri Fontaine
Kohei KaiGai  writes:
> We still don't have clear direction of the way to implement external 
> permission
> checks on object creation time. So, please consider these patches are on the
> proof-of-concept stage; using prep-creation-hook to permission checks.

I wonder if you could implement that as an extension given the command
trigger patch finds its way in.  What do you think?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on implementing URI syntax for libpq

2011-11-26 Thread Alexander Shulgin

Excerpts from Peter Eisentraut's message of Thu Nov 24 22:05:09 +0200 2011:
> 
> On tor, 2011-11-24 at 15:43 +0200, Alexander Shulgin wrote:
> > Huh?  The service definitions are read from a local pg_service.conf,
> > and are specified by setting PGSERVICE (and PGSERVICEFILE) environment
> > variables, no?
> > 
> > What would you do with such URI if you need to other people to connect
> > to the same service?  Send them URI along with the pg_service.conf? 
> 
> A full URI would also rely on host names or IP addresses being the same
> everywhere.  It's all a matter of degree ...

True, but it is much more reasonable to expect that hostnames will resolve to 
the same addresses most of the time (save for zone changes propagation time.)

Still I can imagine where this may be useful, like local networks with shared 
pg_service.conf files.  And since we don't need to do anything special to 
support the behavior (i.e. postgresql:///?service=foo is going to work out of 
the box,) this seems to be a non-problem.

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inlining comparators as a performance optimisation

2011-11-26 Thread Peter Geoghegan
3 items are attached:

1. A spreadsheet, results.ods, which has results for automated
multiple runs of the "doubled up" dellstore orderlines queries (where
orderlines is 385 MB). Results are sorted, with the median (or the
lower middle value, didn't get the mean of the two middle runs) result
for each run highlighted as representative. There are 3 builds of
Postgres (HEAD, not inlined, optimized), each on its own sheet in the
spreadsheet. The cache was warmed for each query, and subsequently the
query was run 20 times.

2. A python script I hacked together that should run on anything
around Python 2.5+, with a single dependency, psycopg2. Look at --help
to see how it works. This is the script that actually generated the
figures that went into the spreadsheet, by being run once for each
type of build. You can fairly easily play along at home with this, for
these and other queries. It will spit out CSV files. It is designed to
warm the cache (by default 3 times before each query) to eliminate
caching effects. You can add your own query to the python list to have
it run by the script, to generate the same set of figures for that
query. I'm rather curious as to how much of an impact this
optimisation will have on queries with unique nodes, joins and
grouping when they rely on a sort node for their input, in the real
world. Certainly, a query need not have an ORDER BY clause to see
benefits, perhaps substantial benefits. The logic to parse sort node
details is rather unsophisticated right now though, due to my focus on
the obvious ORDER BY case.

3. The revision of the patch that was actually tested, now with
inlining specialisations for the single scanKey case, and a
non-inlined specialisation for multiple scanKeys where we can still
benefit from eliding the SQL function call machinery for the first
scanKey, which is often almost as useful as being able to do so for
all scanKeys. It also selects a sorting specialisation when it can in
tuplesort_begin_heap, per Robert's suggestion.

Reviewers will want to comment out line 731 of tuplesort.c, "#define
optimize", to quickly get unoptimized behaviour for comparative
purposes. Furthermore, if you'd like to see what this would look like
without inlining, you can simply comment out assignments of inline
variants of specialisations (i.e. int4inlqsort_arg and
int8inlqsort_arg) in tuplesort_begin_heap.

It has been suggested that I'm chasing diminishing returns by
inlining, as I go further for a smaller benefit. Of course, that's
true. However, I believe that I'm not chasing them past the point
where that ceases to make sense, and these figures support that
contention - chasing diminishing returns in the nature of this kind of
work. Here, the additional benefit of inlining accounts for over an
entire second shaved off a query that was originally 7056ms, so that's
not to be sniffed at.

I'll reiterate that qsort_arg has only been modified 4 times after its
initial commit in 2006, and these were all trivial changes. While the
way that I ape generic programming with the preprocessor is on the
ugly side, what I've done can be much more easily understood with
reference to qsort_arg itself. Robert said that duplicating the sort
function was "iffy". However, that already happened long ago, as we're
already maintaining both qsort_arg.c and qsort.c, and comments already
urge maintainers to keep the two consistent. That's not been a problem
though, because, as I've said, there has never been any call to make
substantive changes to either in all those years.

We could probably further reduce the code footprint of all of this by
having template_qsort_arg.h generate comparators directly. That might
be inflexible in a way that turns out to matter though, like if we
wanted to use this for non-HAVE_INT64_TIMESTAMP timestamps and had to
add NaN crud.

My next step is to see how this goes with hundreds of sorts in the
hundreds of megabytes on a high-end server. I don't have immediate
access to one, but I'm working that out.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
From 92185762297ba9819eb7f1b208db03663d9fdfa8 Mon Sep 17 00:00:00 2001
From: peter2ndQuadrant 
Date: Sun, 20 Nov 2011 20:36:25 +
Subject: [PATCH] Initial commit of optimization

Stop directly using oid

Added int8 quicksort fast path specialisation, which can also be used in
place of F_TIMESTAMP_CMP for HAVE_INT64_TIMESTAMP builds.

Rebased, revised patch for -hackers, with timestamp and int8 fast path
sorting using the same int8 specialization.

Remove unneeded line

Rebase for -hackers

Descriminate against cases where nKeys > 1 when selecting sort
specialisation.

We now support fast path sorting with multiple scanKeys.

We now inline for one scanKey, but do not inline our comparetup_heap
replacement when multiple scanKeys are used (although this is at the
compiler's discretion). However, when multiple scanKeys are used, we
still use a specialisati

Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Tom Lane
Pavan Deolasee  writes:
> On Sat, Nov 26, 2011 at 10:43 PM, Robert Haas  wrote:
>> Furthermore, it's
>> hard to understand how this could be a net improvement in the general
>> case, because now both B and F are copying everything twice (once to
>> the shared area and one to backend-local memory) instead of just once
>> (to backend-local memory) and C and D are sleeping (yikes!).

> Yes, B and F pay a price of double copy. But I think it can be a net
> saving because C and D (and many more hopefully) don't need to
> recompute the snapshot again by going over a potentially large
> ProcArray.

Like Robert, I'm finding it hard to believe that this isn't going to be
a net pessimization in as many or more cases as it'd be a win.  Also,
didn't we just get rid of most of the issue of "going over a large
ProcArray" with the move of hot members to a separate array?

In the end, getting a snapshot is exactly about copying information out
of shared memory.  Perhaps it would be more productive to think about
how we could further modify the ProcArray representation to reduce the
impedance mismatch between it and the snapshot representation, rather
than introducing a second shared-memory copy of the same information.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Pavan Deolasee
On Sat, Nov 26, 2011 at 10:43 PM, Robert Haas  wrote:
> On Sat, Nov 26, 2011 at 10:52 AM, Pavan Deolasee
>  wrote:
>> What we can do is when a transaction comes to compute its snapshot, it
>> checks if some other transaction is already computing a snapshot for
>> itself. If so, it just sleeps on the lock. When the other process
>> finishes computing the snapshot, it saves the snapshot is a shared
>> area and wakes up all processes waiting for the snapshot. All those
>> processes then just copy the snapshot from the shared area and they
>> are done. This will not only reduce the total CPU consumption by
>> avoiding repetitive work, but would also reduce the total time for
>> which ProcArrayLock is held in SHARED mode by avoiding pipeline of
>> GetSnapshotData calls. I am currently trying the shared work queue
>> mechanism to implement this, but I am sure we can do it this in some
>> other way too.
>
> I don't quite understand how this is going to work.

I will try, keeping it simple.

> Transaction A
> ends, invaliding the shared snapshot.  Now transactions B, C, and D
> come along wishing to take snapshots.  B begins taking a snapshot, so
> C and D wait (blech!)

Yeah, C and D waits. But thats better than concurrently doing the same
computation.

> for that to be complete.  Then, they start
> copying the snapshot.

And they are holding the ProcArrayLock in shared mode.

> Meanwhile, transaction E ends, invalidating the
> shared snapshot,

E can't end until B, C and D are done with copying the snapshot.

> and then transaction F comes along, wanting to take a
> snapshot.  If F constructs a new shared snapshot, then doesn't that
> overwrite the same memory area C and D were busy copying?  It seems
> like F needs some way to know that C and D are done with the old
> shared snapshot before it starts writing a new one.

Right. And that can easily be achieved by holding shared lock on
ProcArray. The snapshot is invalidated by holding exclusive lock and
set/copied while holding shared lock. I am assuming setting a boolean
(valid/invalid) can safely be done with a shared lock. But I may be
wrong.

>  Furthermore, it's
> hard to understand how this could be a net improvement in the general
> case, because now both B and F are copying everything twice (once to
> the shared area and one to backend-local memory) instead of just once
> (to backend-local memory) and C and D are sleeping (yikes!).

Yes, B and F pay a price of double copy. But I think it can be a net
saving because C and D (and many more hopefully) don't need to
recompute the snapshot again by going over a potentially large
ProcArray. So as I demonstrated in the illustration, the total time
for which ProcArray lock is held will be significantly less with large
number of clients.

>  That
> could maybe be a gain at very high concurrency when spinlock
> contention is eating us alive, but it doesn't seem like a good idea in
> general.  Maybe I'm missing something; did you mean to attach a patch?
>

I have a patch that I am attaching. It contains some unrelated changes
(don't know why; may be I took a diff with some other branch), but you
will get the idea. This needs improvement though because it just
checks if a valid shared snapshot is available and copies that. If
not, it goes ahead and computes one for itself. We need something more
intelligent to know that a snapshot computation is in progress and we
should wait instead of building our own. This patch had shown 15-20%
improvement on the HP box that we are using for the benchmarks.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com


shared-snapshot-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Modify pg_dump to use error-free memory allocation macros. This

2011-11-26 Thread Tom Lane
Bruce Momjian  writes:
> Basically, if we keep the existing functions in common.c, we are
> deciding that it is never worth moving C functions to new C files for
> source code clarity.  I was not sure we had made that decision in the
> past.

I once held hope that git would let us be more flexible about this,
but experimentation has shown that it's not significantly smarter than
CVS about back-patches in moved code :-(.

I don't want to say that we should *never* move code once it's released;
but I think the bar for that needs to be set pretty high, and this
particular case isn't clearing the bar for me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

2011-11-26 Thread Andrew Dunstan



On 10/05/2011 03:58 AM, Amit Khandekar wrote:

On 5 October 2011 12:29, Alex Hunsaker  wrote:


Find it attached. [ Note I didn't put the check inside the if (ret ==
utf8_str) as it seemed a bit cleaner (indentation wise) to have it
outside ]

I have no more issues with the patch.



Committed.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Modify pg_dump to use error-free memory allocation macros. This

2011-11-26 Thread Bruce Momjian
Tom Lane wrote:
> >> It is not possible to just link the old common.c into pg_restore and
> >> pg_dumpall because it contains calls to pg_dump-only functions.
> >> 
> >> Ideas?
> 
> > The only other idea I have is to keep most functions in the mis-named
> > common.c and move the memory functions into dumpmem.c and dumpmem.h and
> > include that in the other C files, but that doesn't help with long-term
> > code clarity.
> 
> I don't think there is any "long-term code clarity" gain here that is
> worth the maintenance headache.  Please put the existing functions back
> where they were.  Inventing new files for the new code seems fine.

Well, you do a lot more backpatching than I do, so I am willing to do as
you request, but looking at the git history for common.c, I don't see a
lot of backpatching for this file.  

Basically, if we keep the existing functions in common.c, we are
deciding that it is never worth moving C functions to new C files for
source code clarity.  I was not sure we had made that decision in the
past.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Robert Haas
On Sat, Nov 26, 2011 at 10:52 AM, Pavan Deolasee
 wrote:
> What we can do is when a transaction comes to compute its snapshot, it
> checks if some other transaction is already computing a snapshot for
> itself. If so, it just sleeps on the lock. When the other process
> finishes computing the snapshot, it saves the snapshot is a shared
> area and wakes up all processes waiting for the snapshot. All those
> processes then just copy the snapshot from the shared area and they
> are done. This will not only reduce the total CPU consumption by
> avoiding repetitive work, but would also reduce the total time for
> which ProcArrayLock is held in SHARED mode by avoiding pipeline of
> GetSnapshotData calls. I am currently trying the shared work queue
> mechanism to implement this, but I am sure we can do it this in some
> other way too.

I don't quite understand how this is going to work.  Transaction A
ends, invaliding the shared snapshot.  Now transactions B, C, and D
come along wishing to take snapshots.  B begins taking a snapshot, so
C and D wait (blech!) for that to be complete.  Then, they start
copying the snapshot.  Meanwhile, transaction E ends, invalidating the
shared snapshot, and then transaction F comes along, wanting to take a
snapshot.  If F constructs a new shared snapshot, then doesn't that
overwrite the same memory area C and D were busy copying?  It seems
like F needs some way to know that C and D are done with the old
shared snapshot before it starts writing a new one.  Furthermore, it's
hard to understand how this could be a net improvement in the general
case, because now both B and F are copying everything twice (once to
the shared area and one to backend-local memory) instead of just once
(to backend-local memory) and C and D are sleeping (yikes!).  That
could maybe be a gain at very high concurrency when spinlock
contention is eating us alive, but it doesn't seem like a good idea in
general.  Maybe I'm missing something; did you mean to attach a patch?

I had a different idea for avoiding repeated snapshot computation:
save the snapshot in backend-private memory.  When a process takes
ProcArrayLock in exclusive mode, it increments a 64-byte counter.
When a process takes ProcArrayLock in shared mode, it can check
whether the counter has changed; if not, it can reuse the snapshot it
took before.  I think there might be away to tinker with the snapshot
management so as to do this without any more copying than we do
presently, since CurrentSnapshotData and SecondarySnapshotData are
basically treated as scratch-pads anyhow.

Another idea would be to try to avoid taking ProcArrayLock at all.
For example, suppose we again have a 64-bit counter in shared memory.
A transaction wishing to do ProcArrayEndTransaction() takes
ProcArrayLock in exclusive mode, increments the counter, memory
barrier, clears its XID, memory barrier, increments the counter,
releases ProcArrayLock.  A transaction wishing to take a snapshot
first reads the counter.  If the value read from the counter is even,
then we continue like this: memory barrier, take snapshot, memory
barrier, recheck counter.  If we find that the counter is unchanged,
then nobody did ProcArrayEndTransaction() while we were taking the
snapshot, and we're good to go.  If the counter changed then we take
ProcArrayLock in shared mode and retake the snapshot.  (We also take
ProcArrayLock in shared mode if the initial value we read was odd.)
This seems like it would all but eliminate contention on the spinlock
protecting ProcArrayLock, but I'm not sure how much overhead it would
add in other cases.  In particular, if we have to retake the snapshot
more than very occasionally, it's not going to be good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython SPI cursors

2011-11-26 Thread Steve Singer

On 11-11-23 01:58 PM, Jan Urbański wrote:

On 20/11/11 19:14, Steve Singer wrote:

On 11-10-15 07:28 PM, Jan Urbański wrote:

Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).

J


I found a few bugs (see my testing section below) that will need fixing
+ a few questions about the code


Responding now to all questions and attaching a revised patch based on 
your comments.




I've looked over the revised version of the patch and it now seems fine.

Ready for committer.



Do we like the name plpy.cursor or would we rather call it something 
like

plpy.execute_cursor(...) or plpy.cursor_open(...) or
plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is
worth
some opinions on. I like cursor() but if anyone disagrees now is the 
time.


We use plpy.subtransaction() to create Subxact objects, so I though 
plpy.cursor() would be most appropriate.



This patch does not provide a wrapper around SPI_cursor_move. The patch
is useful without that and I don't see anything that preculdes 
someone else

adding that later if they see a need.


My idea is to add keyword arguments to plpy.cursor() that will allow 
you to decide whether you want a scrollable cursor and after that 
provide a move() method.



The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database
access
functions like the plperl
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
does. I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring. That should be done as a seperate patch and
shouldn't be a barrier to committing this one.


Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet 
summoned force to overhaul them.



in PLy_cursor_plan line 4080
+ PG_TRY();
+ {
+ Portal portal;
+ char *volatile nulls;
+ volatile int j;



I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need 
to be

volatile here.


It looked like you could drop volatile there (and in 
PLy_spi_execute_plan, where this is copied from (did I mention there's 
quite some code duplication in PL/Python?)) but digging in git I found 
this commit:


http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000 



that added the original volatile qualification, so I guess there's a 
reason.



line 444
PLy_cursor(PyObject *self, PyObject *args)
+ {
+ char *query;
+ PyObject *plan;
+ PyObject *planargs = NULL;
+
+ if (PyArg_ParseTuple(args, "s", &query))
+ return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()


No, PyArg_ParseTuple returns a string on the stack, I check that 
repeatedly creating a cursor with a plan argument does not leak memory 
and that adding PyMem_Free there promptly leads to a segfault.




I tested both python 2.6 and 3 on a Linux system

[test cases demonstrating bugs]


Turns out it's a really bad idea to store pointers to Portal 
structures, because they get invalidated by the subtransaction abort 
hooks.


I switched to storing the cursor name and looking it up in the 
appropriate hash table every time it's used. The examples you sent 
(which I included as regression tests) now cause a ValueError to be 
raised with a message stating that the cursor has been created in an 
aborted subtransaction.


Not sure about the wording of the error message, though.

Thanks again for the review!

Cheers,
Jan







Re: [HACKERS] psql setenv command

2011-11-26 Thread Andrew Dunstan



On 11/20/2011 11:07 AM, Josh Kupershmidt wrote:

On Wed, Nov 2, 2011 at 5:36 PM, Andrew Dunstan  wrote:

Updated patch is attached - adding to Nov commitfest.

Review of the v2 patch:

* Submission Review
Patch applies with some fuzz and builds without warnings. I noticed
some tab characters being used in psql-ref.sgml where they shouldn't
be.



Fixed.




* Coding Review / Nitpicks
The patch implements \setenv via calls to unsetenv() and putenv(). As
the comment notes,

| That means we
| leak a bit of memory here, but not enough to worry about.

which seems true; the man page basically says there's nothing to be
done about the situation.

The calls to putenv() and unsetenv() are done without any real input
checking. So this admittedly-pathological case produces surprising
results:

test=# \setenv foo=bar baz
test=# \! echo $foo
bar=baz

Perhaps 'envvar' arguments with a '=' in the argument should just be
disallowed.


Done that way.



Also, should the malloc() of newval just use pg_malloc() instead?



Yes, also done.


Revised patch attached.

cheers

andrew


diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 01f57c4..39193d5 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2242,6 +2242,24 @@ lo_import 152801
 
 
   
+\setenv [ name [ value ] ]
+
+
+
+Sets the environment variable name to value, or if the 
+value is
+not supplied, unsets the environment variable. Example:
+
+foo=> \setenv PAGER less
+foo=> \setenv LESS -imx4F
+
+
+
+  
+
+  
 \sf[+] function_description 
 
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9cc73be..b5331aa 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1110,6 +1110,56 @@ exec_command(const char *cmd,
 		free(opt0);
 	}
 
+
+	/* \setenv -- set environment command */
+	else if (strcmp(cmd, "setenv") == 0)
+	{
+		char	   *envvar = psql_scan_slash_option(scan_state,
+  OT_NORMAL, NULL, false);
+		char	   *envval = psql_scan_slash_option(scan_state,
+  OT_NORMAL, NULL, false);
+
+		if (!envvar)
+		{
+			psql_error("\\%s: missing required argument\n", cmd);
+			success = false;
+		}
+		else if (strchr(envvar,'=') != NULL)
+		{
+			psql_error("\\%s: environment variable name must not contain '='\n",
+	   cmd);
+			success = false;
+		}
+		else if (!envval)
+		{
+			/* No argument - unset the environment variable */
+			unsetenv(envvar);
+			success = true;
+		}
+		else
+		{
+			/* Set variable to the value of the next argument */
+			int len = strlen(envvar) + strlen(envval) + 1;
+			char	   *newval = pg_malloc(len + 1);
+
+			if (!newval)
+			{
+psql_error("out of memory\n");
+exit(EXIT_FAILURE);
+			}
+			snprintf(newval, len+1, "%s=%s", envvar, envval);
+			putenv(newval);
+			success = true;
+			/*
+			 * Do not free newval here, it will screw up the environment
+			 * if you do. See putenv man page for details. That means we
+			 * leak a bit of memory here, but not enough to worry about.
+			 */
+		}
+		free(envvar);
+		free(envval);
+	}
+
 	/* \sf -- show a function's source code */
 	else if (strcmp(cmd, "sf") == 0 || strcmp(cmd, "sf+") == 0)
 	{
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 4649e94..20fb5d7 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -158,7 +158,7 @@ slashUsage(unsigned short int pager)
 {
 	FILE	   *output;
 
-	output = PageOutput(93, pager);
+	output = PageOutput(94, pager);
 
 	/* if you add/remove a line here, change the row count above */
 
@@ -257,6 +257,7 @@ slashUsage(unsigned short int pager)
 
 	fprintf(output, _("Operating System\n"));
 	fprintf(output, _("  \\cd [DIR]  change the current working directory\n"));
+	fprintf(output, _("  \\setenv NAME [VALUE]   set environment variable, or unset it if no value provided\n"));
 	fprintf(output, _("  \\timing [on|off]   toggle timing of commands (currently %s)\n"),
 			ON(pset.timing));
 	fprintf(output, _("  \\! [COMMAND]   execute command in shell or start interactive shell\n"));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Modify pg_dump to use error-free memory allocation macros. This

2011-11-26 Thread Tom Lane
Bruce Momjian  writes:
> Bruce Momjian wrote:
>> Tom Lane wrote:
>>> object to arbitrarily moving a bunch of longstanding code from one file
>>> to another.  What that is mainly going to accomplish is creating a big
>>> headache whenever we have to back-patch fixes that touch that code
>>> ... and what exactly did it buy in return?

>> Yes, I didn't like that either.  The problem was that common.c was setup
>> to share code between pg_dump and a long-forgotten tool for Postgres 4.X
>> called pg4_dump (yes, pre-1996).  That code that was moved was really
>> not "common" in any current sense because it was used only by pg_dump
>> (not by pg_restore or pg_dumpall), so I moved it into dumpcatalog.c, and
>> put the really "common" code into common.c. (We could call it dumpmem.c
>> or something.)
>> 
>> Now, one approach would have been to rename common.c to dumpcatalog.c in
>> git, then create a new common.c, but that seems quite confusing to
>> people trying to reconstruct the history.

That will not help.  git is not smart enough to deal with back-patching
changes in code that has moved from one file to another.  If you force
this through it will mean manual adjustment of every back-patchable fix
that has to touch this code, for the next five or six years.  Yes, the
name "common.c" is a bit of a misnomer now, but that doesn't mean we
have to change it, especially since the file's header comment already
explained what it was common for (and could be modified some more if
you don't find that adequate).

>> It is not possible to just link the old common.c into pg_restore and
>> pg_dumpall because it contains calls to pg_dump-only functions.
>> 
>> Ideas?

> The only other idea I have is to keep most functions in the mis-named
> common.c and move the memory functions into dumpmem.c and dumpmem.h and
> include that in the other C files, but that doesn't help with long-term
> code clarity.

I don't think there is any "long-term code clarity" gain here that is
worth the maintenance headache.  Please put the existing functions back
where they were.  Inventing new files for the new code seems fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Avoiding repeated snapshot computation

2011-11-26 Thread Pavan Deolasee
On some recent benchmarks and profile data, I saw GetSnapshotData
figures at the very top or near top. For lesser number of clients, it
can account for 10-20% of time, but more number of clients I have seen
it taking up as much as 40% of sample time. Unfortunately, the machine
of which I was running these tests is currently not available and so I
don't have the exact numbers. But the observation is almost correct.
Our recent work on separating the hot members of PGPROC in a separate
array would definitely reduce data cache misses ans reduce the
GetSnapshotData time, but it probably still accounts for a large
enough critical section for a highly contended lock.

I think now that we have reduced the run time of the function itself,
we should now try to reduce the number of times the function is
called. Robert proposed a way to reduce the number of calls per
transaction. I think we can go one more step further and reduce the
number for across the transactions.

One major problem today could be because the way LWLock works. If the
lock is currently held in SHARED mode by some backend and some other
backend now requests it in SHARED mode, it will immediately get it.
Thats probably the right thing to do because you don't want the reader
to really wait when the lock is readily available. But in the case of
GetSnapshotData(), every reader is doing exactly the same thing; they
are computing a snapshot based on the same shared state and would
compute exactly the same snapshot (if we ignore the fact that we don't
include caller's XID in xip array, but thats a minor detail). And
because the way LWLock works, more and more readers would get in to
compute the snapshot, until the exclusive waiters get a window to
sneak in, either because more and more processes slowly start sleeping
for exclusive access. To depict it, the four transactions make
overlapping calls for GetSnapshotData() and hence the total critical
section starts when the first caller enters it and the ends when the
last caller exits.

Txn1 --[  SHARED]-
Txn2 [  SHARED]---
Txn3 -[SHARED]-
Txn4 ---[   SHARED
]-
  |<---Total Time >|

Couple of ideas come to mind to solve this issue.

A snapshot once computed will remain valid for every call irrespective
of its origin until at least one transaction ends. So we can store the
last computed snapshot in some shared area and reuse it for all
subsequent GetSnapshotData calls. The shared snapshot will get
invalidated when some transaction ends by calling
ProcArrayEndTransaction(). I tried this approach and saw a 15%
improvement for 32-80 clients on the 32 core HP IA box with pgbench -s
100 -N tests. Not bad, but I think this can be improved further.

What we can do is when a transaction comes to compute its snapshot, it
checks if some other transaction is already computing a snapshot for
itself. If so, it just sleeps on the lock. When the other process
finishes computing the snapshot, it saves the snapshot is a shared
area and wakes up all processes waiting for the snapshot. All those
processes then just copy the snapshot from the shared area and they
are done. This will not only reduce the total CPU consumption by
avoiding repetitive work, but would also reduce the total time for
which ProcArrayLock is held in SHARED mode by avoiding pipeline of
GetSnapshotData calls. I am currently trying the shared work queue
mechanism to implement this, but I am sure we can do it this in some
other way too.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vpath builds and verbose error messages

2011-11-26 Thread Tom Lane
Peter Eisentraut  writes:
> On fre, 2011-11-18 at 09:44 -0500, Tom Lane wrote:
>> It wouldn't be that hard for elog.c to do strrchr(fname, '/') or
>> something like that,

> Here is a patch for that.  I would also like to backpatch this.

Hmmm ... is it possible that strrchr could change errno?  If so we'd
need to re-order the operations a bit.  Otherwise this seems fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : backend startup hook / after logon trigger

2011-11-26 Thread Tomas Vondra
Dne 25.11.2011 17:48, Tom Lane napsal(a):
> "Tomas Vondra"  writes:
>> On 25 Listopad 2011, 2:44, Robert Haas wrote:
>>> I've thought of this before, but I'm not exactly clear on what the use
>>> cases are.
> 
>> The startup hook is useful for initializing an extension written in C,
>> when the extension was loaded from postgresql.conf. If you need to perform
>> the initialization for each db separately (so that you can decide whether
>> to apply the extension to the user/database), you need to do that after
>> the backend starts.
> 
> If you need that, just load the extension with local_preload_libraries.

I can't do that. I have an extension that needs to be loaded from
shared_preload_libraries (because it needs to ask for space in shared
memory segment), but I need to perform some additional initialization
for each backend.

Right now I solve that with an 'initialized' variable that's set to
false, and I have to check that before each action (and perform the init
if it's false). An after-startup hook would be much cleaner.

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \ir filename normalization

2011-11-26 Thread Bruce Momjian
Bruce Momjian wrote:
> Robert Haas wrote:
> > On Mon, Nov 21, 2011 at 2:30 PM, Bruce Momjian  wrote:
> > > Robert Haas wrote:
> > >> On Mon, Nov 21, 2011 at 1:05 PM, Bruce Momjian  wrote:
> > >> > Robert Haas wrote:
> > >> >> Argh. ?The root of the problem here seems to be that
> > >> >> join_path_components() feels entitled to arbitrarily insert a pathname
> > >> >> separator at the front of the output string even if its first input
> > >> >> didn't begin with one originally. ?Lame!
> > >> >
> > >> > The attached patch fixes this report, I think.
> > >>
> > >> Looks sensible. ?Keep in mind we need to back-patch this.
> > >
> > > Oh. ?Well, with no bug reports about it, does that make sense? ?Do we
> > > have any code that relies on the old behavior?
> > 
> > Oh, wait a minute.  I was thinking \ir was in 9.1, but it's not: it
> > was committed after the branch.  So I guess this only needs to be
> > fixed in master, which is much less scary.
> 
> Agreed.  I realize it is wrong but I have no idea what impact fixing it
> in back branches might have, or people who are relying on the broken
> behavior in some way.

Patch applied.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vpath builds and verbose error messages

2011-11-26 Thread Peter Eisentraut
On fre, 2011-11-18 at 09:44 -0500, Tom Lane wrote:
> It wouldn't be that hard for elog.c to do strrchr(fname, '/') or
> something like that,

Here is a patch for that.  I would also like to backpatch this.

diff --git i/src/backend/utils/error/elog.c w/src/backend/utils/error/elog.c
index 9a99fc7..0131c06 100644
--- i/src/backend/utils/error/elog.c
+++ w/src/backend/utils/error/elog.c
@@ -343,7 +343,14 @@ errstart(int elevel, const char *filename, int lineno,
 	edata->elevel = elevel;
 	edata->output_to_server = output_to_server;
 	edata->output_to_client = output_to_client;
-	edata->filename = filename;
+	if (filename)
+	{
+		const char *slash;
+
+		/* keep only base name, useful especially for vpath builds */
+		slash = strrchr(filename, '/');
+		edata->filename = slash ? slash + 1 : filename;
+	}
 	edata->lineno = lineno;
 	edata->funcname = funcname;
 	/* the default text domain is the backend's */
@@ -1142,7 +1149,14 @@ elog_start(const char *filename, int lineno, const char *funcname)
 	}
 
 	edata = &errordata[errordata_stack_depth];
-	edata->filename = filename;
+	if (filename)
+	{
+		const char *slash;
+
+		/* keep only base name, useful especially for vpath builds */
+		slash = strrchr(filename, '/');
+		edata->filename = slash ? slash + 1 : filename;
+	}
 	edata->lineno = lineno;
 	edata->funcname = funcname;
 	/* errno is saved now so that error parameter eval can't change it */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inserting heap tuples in bulk in COPY

2011-11-26 Thread Heikki Linnakangas

On 25.11.2011 23:32, Jeff Janes wrote:

On Fri, Nov 25, 2011 at 12:53 PM, Jeff Janes  wrote:


Thanks for this patch.  Doing bulk copies in parallel for me is now
limited by the IO subsystem rather than the CPU.

This patch, commit number d326d9e8ea1d69, causes fillfactor to be
ignored for the copy command.  Is this acceptable collateral damage?


Having looked into it a little bit, I think this might be an acceptable fix.


Thanks, applied. It was an oversight.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers