date:20140924

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-24 Thread David Johnston

On Thu, Sep 25, 2014 at 1:04 AM, Tom Lane  wrote:

> David Johnston  writes:
> > On Thu, Sep 25, 2014 at 12:46 AM, Tom Lane  wrote:
> >> TBH I've also been wondering whether any of these proposed cures are
> >> better than the disease.  The changes that can be argued to make the
> >> behavior more sane are also ones that introduce backwards compatibility
> >> issues of one magnitude or another.  And I do not have a lot of sympathy
> >> for "let's not change anything except to throw an error in a case that
> >> seems ambiguous".  That's mostly being pedantic, not helpful, especially
> >> seeing that the number of field complaints about it is indistinguishable
> >> from zero.
>
> > Then what does it matter that we'd choose to error-out?
>
> The number of complaints about the *existing* behavior is indistinguishable
> from zero (AFAIR anyway).  It does not follow that deciding to throw an
> error where we did not before will draw no complaints.
>
>
Any change has the potential to draw complaints.  For you it seems that
"hey, I upgraded to 9.5 and my logs are being rotated out every minute
now.  I thought I had that turned off" is the desired complaint.  Greg
wants: "hey, my 1 hour log rotation is now happening every minute".  If the
error message is written correctly most people upon seeing the error will
simply fix their configuration and move on - regardless of whether they
were proactive in doing so having read the release notes.

And, so what if we get some complaints?  We already disallow the specified
values (instead, turning them into zeros) so it isn't like we are further
restricting user capabilities.

If I was to commit a patch that didn't throw an error I'd ask the author to
provide an outline for each of the affected parameters and what it would
mean (if possible) for a setting currently rounded to zero to instead be
rounded to 1.

Its the same language in the release notes to get someone to avoid the
error as it is to get them to not be impacted by the rounding change so the
difference is those people who would not read the release notes.  The
"error" outcome is simple, direct, and fool-proof - and conveys the fact
that what they are asking for is invalid. It may be a little scary but at
least we can be sure nothing bad is happening in the system.  If the
argument is that there are two few possible instances out there to expend
the effort to catalog every parameter then there isn't enough surface area
to care about throwing an error. And I still maintain that anyone setting
values for a clean installation would rather be told their input is not
valid instead of silently making it so.

Changing the unit for log_rotate_age when their is basically no one
asking for that seems like the worse solution at face value; my +1 not
withstanding.  I gave that mostly because if you are going to overhaul the
system then making everything "ms" seems like the right thing to do.  I
think that such an effort would be a waste of resources.

I don't have clients so maybe my acceptance of errors is overly optimistic
- but in this case I truly don't see enough people even realizing that
there was a change and those few that do should be able to read the error
and make the same change that they would need to make anyway if the
rounding option is chosen.

My only concern here, and it probably applies to any solution, is that the
change is implemented properly so that all possible user input areas throw
the error correctly and as appropriate to the provided interface.  That is,
interactive use immediately errors out without changing the default values
while explicit/manual file changes cause the system to error at startup.  I
haven't looked into the code parts of this but I have to imagine this is
already covered since even with units and such invalid input is still
possible and needs to be addressed; this only add one more check to that
existing routine.

David J.

Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-24 Thread Gregory Smith


On 9/24/14, 10:10 PM, Robert Haas wrote:
I think you're making a mountain out of a molehill. I implemented this 
today in about three hours.


I think you're greatly understating your own efficiency at 
shift/reducing parser mountains down to molehills.  Fabien even guessed 
the LOC size of the resulting patch with less than a 9% error.  That's 
some top notch software metrics and development work there boys; kudos 
all around.


Let's get this operator support whipped into shape, then we can add the 
2 to 3 versions of the modulo operator needed to make the major use 
cases work.  (There was never going to be just one hacked in with a 
quick patch that satisfied the multiple ways you can do this)


Then onto the last missing pieces of Fabien's abnormally distributed 
test workload vision.  He seems kind of stressed about the process 
lately; not sure what to say about it.  Yes, the PostgreSQL community is 
hostile to short, targeted feature improvements unless they come already 
fit into a large, standards compliant strategy for improving the 
database.  That doesn't work well when approached by scratch an itch 
stye development.  Nothing we can really do about it


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-24 Thread Tom Lane

David Johnston  writes:
> On Thu, Sep 25, 2014 at 12:46 AM, Tom Lane  wrote:
>> TBH I've also been wondering whether any of these proposed cures are
>> better than the disease.  The changes that can be argued to make the
>> behavior more sane are also ones that introduce backwards compatibility
>> issues of one magnitude or another.  And I do not have a lot of sympathy
>> for "let's not change anything except to throw an error in a case that
>> seems ambiguous".  That's mostly being pedantic, not helpful, especially
>> seeing that the number of field complaints about it is indistinguishable
>> from zero.

> âThen what does it matter that we'd choose to error-out?â

The number of complaints about the *existing* behavior is indistinguishable
from zero (AFAIR anyway).  It does not follow that deciding to throw an
error where we did not before will draw no complaints.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-24 Thread David Johnston

On Thu, Sep 25, 2014 at 12:46 AM, Tom Lane  wrote:

> Gregory Smith  writes:
> > I don't see any agreement on the real root of a problem here yet. That
> > makes gauging whether any smaller change leads that way or not fuzzy.  I
> > personally would be fine doing nothing right now, instead waiting until
> > that's charted out--especially if the alternative is applying any of the
> > rounding or error throwing ideas suggested so far.  I'd say that I hate
> > to be that guy who tells everyone else they're wrong, but we all know I
> > enjoy it.
>
> TBH I've also been wondering whether any of these proposed cures are
> better than the disease.  The changes that can be argued to make the
> behavior more sane are also ones that introduce backwards compatibility
> issues of one magnitude or another.  And I do not have a lot of sympathy
> for "let's not change anything except to throw an error in a case that
> seems ambiguous".  That's mostly being pedantic, not helpful, especially
> seeing that the number of field complaints about it is indistinguishable
> from zero.
>
>
Then what does it matter that we'd choose to error-out?


> I am personally not as scared of backwards-compatibility problems as some
> other commenters: I do not think that there's ever been a commitment that
> postgresql.conf contents will carry forward blindly across major releases.
> So I'd be willing to break strict compatibility in the name of making the
> behavior less surprising.  But the solutions that have been proposed that
> hold to strict backwards compatibility requirements are not improvements
> IMHO.
>

Or, put differently, the pre-existing behavior is fine so don't fix what
isn't broken.

This patch simply fixes an oversight in the original implementation - that
someone might try to specify an invalid value (i.e., between 0 and 1).  if
0 and -1 are flags, then the minimum allowable value is 1.  The logic
should have been: range [1, something]; 0 (optionally); -1 (optionally).
Values abs(x) between 0 and 1 (exclusive) should be disallowed and, like an
attempt to specify 0.5 (without units), should throw an error.

David J.

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2014-09-24 Thread Abhijit Menon-Sen

At 2014-09-24 11:09:24 +0200, and...@2ndquadrant.com wrote:
>
> Why not add it to the stattuple extension, but as an independent
> function (and file)?

Thanks, that's a good idea. I'll do that.

> I think it's completely unacceptable to copy a visibility routine.

OK. Which visibility routine should I use, and should I try to create a
variant that doesn't set hint bits?

I don't have any reasoning for why it's safe to not hold a content lock.
If there is one, I need help to find it. If not, I'll acquire a content
lock. (If anyone can explain why it isn't safe, I would appreciate it.)

-- Abhijit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-24 Thread David Johnston

On Thu, Sep 25, 2014 at 12:11 AM, Gregory Smith 
wrote:

> On 9/24/14, 6:45 PM, Peter Eisentraut wrote:
>
>> But then this proposal is just one of several others that break backward
>> compatibility, and does so in a more or less silent way.  Then we might
>> as well pick another approach that gets closer to the root of the problem.
>>
>
> I was responding to some desire to get a quick fix that cut off the worst
> of the problem, and the trade-offs of the other ideas bothered me even
> more.  Obvious breakage is annoying, but small and really subtle version to
> version incompatibilities drive me really crazy. A 60X scale change to
> log_rotation_age will be big enough that impacted users should definitely
> notice, while not being so big it's scary.  Rounding differences are small
> enough to miss.  And I do see whacking the sole GUC that's set in minutes,
> which I was surprised to find is a thing that existed at all, as a useful
> step forward.
>

Why?  Do you agree that a log_rotation_age value defined in seconds is
sane?  If your reasoning is that everything else is defined in "s" and "ms"
then that is a developer, not a user, perspective.  I'll buy into the
"everything is defined in the smallest possible unit" approach - in which
case the whole rounding things becomes a non-issue - but if you leave some
of these in seconds then we should still add an error if someone defines an
insane millisecond value for those parameters.


>
> I can't agree with Stephen's optimism that some wording cleanup is all
> that's needed here.  I predict it will be an annoying, multiple commit job
> just to get the docs right, describing both the subtle rounding change and
> how it will impact users.  (Cry an extra tear for the translators)
> 
>
>
Maybe I'm nieve but I'm seriously doubting this.  From what I can tell the
rounding isn't currently documented and really doesn't need much if any to
be added.  An error instead of rounding down to zero would be sufficient
and self-contained.  "The value specified is less than 1 in the parameter's
base unit"


>
> Meanwhile, I'd wager the entire work of my log_rotation_scale unit change
> idea, from code to docs to release notes, will take less time than just
> getting the docs done on any rounding change.  Probably cause less field
> breakage and confusion in the end too.
>

You've already admitted there will be breakage.  Your argument is that it
will be obvious enough to notice.  Why not just make it impossible?



>
> The bad case I threw out is a theorized one that shows why we can't go
> completely crazy with jerking units around, why 1000:1 adjustments are
> hard.  I'm not actually afraid of that example being in the wild in any
> significant quantity.  The resulting regression from a 60X scale change
> should not be so bad that people will suffer greatly from it either.
> Probably be like the big 10:1 change in default_statistics_target.  Seemed
> scary that some people might be nailed by it; didn't turn out to be such a
> big deal.
>
> I don't see any agreement on the real root of a problem here yet. That
> makes gauging whether any smaller change leads that way or not fuzzy.  I
> personally would be fine doing nothing right now, instead waiting until
> that's charted out--especially if the alternative is applying any of the
> rounding or error throwing ideas suggested so far.  I'd say that I hate to
> be that guy who tells everyone else they're wrong, but we all know I enjoy
> it.
>
>
Maybe not: I contend the root problem is that while we provide sane unit
specifications we've assumed that people will always be providing values
greater than 1 - but if they don't we silently use zero (a special value)
instead of telling them they messed up (made an error).  If the presence of
config.d and such make this untenable then I'd say those features have a
problem.- not our current choice to define what sane is.

David J.

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-24 Thread Tom Lane

Gregory Smith  writes:
> I don't see any agreement on the real root of a problem here yet. That 
> makes gauging whether any smaller change leads that way or not fuzzy.  I 
> personally would be fine doing nothing right now, instead waiting until 
> that's charted out--especially if the alternative is applying any of the 
> rounding or error throwing ideas suggested so far.  I'd say that I hate 
> to be that guy who tells everyone else they're wrong, but we all know I 
> enjoy it.

TBH I've also been wondering whether any of these proposed cures are
better than the disease.  The changes that can be argued to make the
behavior more sane are also ones that introduce backwards compatibility
issues of one magnitude or another.  And I do not have a lot of sympathy
for "let's not change anything except to throw an error in a case that
seems ambiguous".  That's mostly being pedantic, not helpful, especially
seeing that the number of field complaints about it is indistinguishable
from zero.

I am personally not as scared of backwards-compatibility problems as some
other commenters: I do not think that there's ever been a commitment that
postgresql.conf contents will carry forward blindly across major releases.
So I'd be willing to break strict compatibility in the name of making the
behavior less surprising.  But the solutions that have been proposed that
hold to strict backwards compatibility requirements are not improvements
IMHO.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-24 Thread Gregory Smith


On 9/24/14, 6:45 PM, Peter Eisentraut wrote:

But then this proposal is just one of several others that break backward
compatibility, and does so in a more or less silent way.  Then we might
as well pick another approach that gets closer to the root of the problem.


I was responding to some desire to get a quick fix that cut off the 
worst of the problem, and the trade-offs of the other ideas bothered me 
even more.  Obvious breakage is annoying, but small and really subtle 
version to version incompatibilities drive me really crazy. A 60X scale 
change to log_rotation_age will be big enough that impacted users should 
definitely notice, while not being so big it's scary.  Rounding 
differences are small enough to miss.  And I do see whacking the sole 
GUC that's set in minutes, which I was surprised to find is a thing that 
existed at all, as a useful step forward.


I can't agree with Stephen's optimism that some wording cleanup is all 
that's needed here.  I predict it will be an annoying, multiple commit 
job just to get the docs right, describing both the subtle rounding 
change and how it will impact users.  (Cry an extra tear for the 
translators)


Meanwhile, I'd wager the entire work of my log_rotation_scale unit 
change idea, from code to docs to release notes, will take less time 
than just getting the docs done on any rounding change.  Probably cause 
less field breakage and confusion in the end too.


The bad case I threw out is a theorized one that shows why we can't go 
completely crazy with jerking units around, why 1000:1 adjustments are 
hard.  I'm not actually afraid of that example being in the wild in any 
significant quantity.  The resulting regression from a 60X scale change 
should not be so bad that people will suffer greatly from it either.  
Probably be like the big 10:1 change in default_statistics_target.  
Seemed scary that some people might be nailed by it; didn't turn out to 
be such a big deal.


I don't see any agreement on the real root of a problem here yet. That 
makes gauging whether any smaller change leads that way or not fuzzy.  I 
personally would be fine doing nothing right now, instead waiting until 
that's charted out--especially if the alternative is applying any of the 
rounding or error throwing ideas suggested so far.  I'd say that I hate 
to be that guy who tells everyone else they're wrong, but we all know I 
enjoy it.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] interval typmodout is broken

2014-09-24 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> You sure about that?  The grammar for INTERVAL is weird.

> Well, I tested what is taken on input, and yes I agree the grammar is
> weird (but not more weird than timestamp/timestamptz, mind).  The input
> function only accepts the precision just after the INTERVAL keyword, not
> after the fieldstr:

> alvherre=# create table str (a interval(2) hour to minute);
> CREATE TABLE

> alvherre=# create table str2 (a interval hour to minute(2));
> ERROR:  syntax error at or near "("
> LÍNEA 1: create table str2 (a interval hour to minute(2));
>  ^

No, that's not about where it is, it's about what the field is: only
"second" can have a precision.  Our grammar is actually allowing stuff
here that it shouldn't.  According to the SQL spec, you could write
interval hour(2) to minute
but this involves a "leading field precision", which we do not support
and should definitely not be conflating with trailing-field precision.
Or you could write
interval hour to second(2)
which is valid and we support it.  You can *not* write
interval hour to minute(2)
either per spec or per our implementation; and
interval(2) hour to minute
is 100% invalid per spec, even though our grammar goes out of its
way to accept it.

In short, the typmodout function is doing what it ought to.  It's the
grammar that's broken.  It looks to me like Tom Lockhart coded the
grammar to accept a bunch of cases that he never got round to actually
implementing reasonably.  In particular, per SQL spec these are
completely different animals:
interval hour(2) to second
interval hour to second(2)
but our grammar transforms them into the same thing.

We ought to fix that...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identify_locking_dependencies is broken for schema-only dumps

2014-09-24 Thread Robert Haas

On Wed, Sep 24, 2014 at 4:36 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Sep 24, 2014 at 12:18 PM, Tom Lane  wrote:
>>> Sorry for not paying attention sooner.  After studying it for awhile,
>>> I think the change is probably all right but your proposed comment is
>>> entirely inadequate.
>
>> If you don't like that version, can you suggest something you would like 
>> better?
>
> Perhaps like this:
>
>  * We assume the entry requires exclusive lock on each TABLE or TABLE DATA
>  * item listed among its dependencies.  Originally all of these would have
>  * been TABLE items, but repoint_table_dependencies would have repointed
>  * them to the TABLE DATA items if those are present (which they might not
>  * be, eg in a schema-only dump).  Note that all of the entries we are
>  * processing here are POST_DATA; otherwise there might be a significant
>  * difference between a dependency on a table and a dependency on its
>  * data, so that closer analysis would be needed here.

Works for me.  I'll push with that text unless you'd like to take care of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-24 Thread Robert Haas

On Wed, Sep 24, 2014 at 2:34 PM, Fabien COELHO  wrote:
> Sigh.
>
> How to transform a trivial 10 lines patch into a probably 500+ lines project
> involving flex & bison & some non trivial data structures, and which may get
> rejected on any ground...
>
> Maybe I'll set that as a student project.

I think you're making a mountain out of a molehill.  I implemented
this today in about three hours.  The patch is attached.  It needs
more testing, documentation, and possibly some makefile adjustments,
but it seems to basically work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/pgbench/.gitignore b/contrib/pgbench/.gitignore
index 489a2d6..aae819e 100644
--- a/contrib/pgbench/.gitignore
+++ b/contrib/pgbench/.gitignore
@@ -1 +1,3 @@
+/exprparse.c
+/exprscan.c
 /pgbench
diff --git a/contrib/pgbench/Makefile b/contrib/pgbench/Makefile
index b8e2fc8..67e2bf6 100644
--- a/contrib/pgbench/Makefile
+++ b/contrib/pgbench/Makefile
@@ -4,7 +4,7 @@ PGFILEDESC = "pgbench - a simple program for running benchmark tests"
 PGAPPICON = win32
 
 PROGRAM = pgbench
-OBJS	= pgbench.o $(WIN32RES)
+OBJS	= pgbench.o exprparse.o $(WIN32RES)
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
 PG_LIBS = $(libpq_pgport) $(PTHREAD_LIBS)
@@ -18,8 +18,22 @@ subdir = contrib/pgbench
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
+
+distprep: exprparse.c exprscan.c
 endif
 
 ifneq ($(PORTNAME), win32)
 override CFLAGS += $(PTHREAD_CFLAGS)
 endif
+
+# There is no correct way to write a rule that generates two files.
+# Rules with two targets don't have that meaning, they are merely
+# shorthand for two otherwise separate rules.  To be safe for parallel
+# make, we must chain the dependencies like this.  The semicolon is
+# important, otherwise make will choose the built-in rule for
+# gram.y=>gram.c.
+
+exprparse.h: exprparse.c ;
+
+# exprscan is compiled as part of exprparse
+exprparse.o: exprscan.c
diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
new file mode 100644
index 000..37eb538
--- /dev/null
+++ b/contrib/pgbench/exprparse.y
@@ -0,0 +1,96 @@
+%{
+/*-
+ *
+ * exprparse.y
+ *	  bison grammar for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+#include "postgres_fe.h"
+
+#include "pgbench.h"
+
+PgBenchExpr *expr_parse_result;
+
+static PgBenchExpr *make_integer_constant(int64 ival);
+static PgBenchExpr *make_variable(char *varname);
+static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
+		PgBenchExpr *rexpr);
+
+%}
+
+%expect 0
+%name-prefix="expr_yy"
+
+%union
+{
+	int64		ival;
+	char	   *str;
+	PgBenchExpr *expr;
+}
+
+%type  expr
+%type  INTEGER
+%type  VARIABLE
+%token INTEGER VARIABLE
+
+%left	'+' '-'
+%left	'*' '/' '%'
+%right	UMINUS
+
+%%
+
+result: expr{ expr_parse_result = $1; }
+
+expr: '(' expr ')'			{ $$ = $2; }
+	| '+' expr %prec UMINUS	{ $$ = $2; } 
+	| '-' expr %prec UMINUS
+		{ $$ = make_op('-', make_integer_constant(0), $2); } 
+	| expr '+' expr			{ $$ = make_op('+', $1, $3); }
+	| expr '-' expr			{ $$ = make_op('-', $1, $3); }
+	| expr '*' expr			{ $$ = make_op('*', $1, $3); }
+	| expr '/' expr			{ $$ = make_op('/', $1, $3); }
+	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
+	| INTEGER{ $$ = make_integer_constant($1); }
+	| VARIABLE { $$ = make_variable($1); }
+	;
+
+%%
+
+static PgBenchExpr *
+make_integer_constant(int64 ival)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr->etype = ENODE_INTEGER_CONSTANT;
+	expr->u.integer_constant.ival = ival;
+	return expr;
+}
+
+static PgBenchExpr *
+make_variable(char *varname)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr->etype = ENODE_VARIABLE;
+	expr->u.variable.varname = varname;
+	return expr;
+}
+
+static PgBenchExpr *
+make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr->etype = ENODE_OPERATOR;
+	expr->u.operator.operator = operator;
+	expr->u.operator.lexpr = lexpr;
+	expr->u.operator.rexpr = rexpr;
+	return expr;
+}
+
+#include "exprscan.c"
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
new file mode 100644
index 000..4409f08
--- /dev/null
+++ b/contrib/pgbench/exprscan.l
@@ -0,0 +1,100 @@
+%{
+/*-
+ *
+ * exprscan.l
+ *	  a lexical scanner for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+s

Re: [HACKERS] delta relations in AFTER triggers

2014-09-24 Thread Craig Ringer

On 09/15/2014 10:25 PM, Kevin Grittner wrote:
> I broke out the changes from the previous patch in multiple commits
> in my repository on github:

*Thankyou*

That gives me the incentive to pull it and test it.

A nice patch series published in a git repo is so much easier to work
with than a giant squashed patch as an attachment.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate standby promotion

2014-09-24 Thread Michael Paquier

On Thu, Sep 25, 2014 at 5:36 AM, Simon Riggs  wrote:
> We go to a lot of trouble to ensure data is successfully on disk and
> in WAL. I won't give that up, nor do I want to make it easier to lose
> data than it already is.
+1.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-09-24 Thread Craig Ringer

On 08/28/2014 05:03 AM, Kevin Grittner wrote:
> I don't have to squint that hard -- I've always been comfortable
> with the definition of a table as a relation variable, and it's not
> too big a stretch to expand that to a tuplestore.  ;-)  In fact, I
> will be surprised if someone doesn't latch onto this to create a
> new "declared temporary table" that only exists within the scope of
> a compound statement (i.e., a BEGIN/END block).  You would DECLARE
> them just like you would a scalar variable in a PL, and they would
> have the same scope.
> 
> I'll take a look at doing this in the next couple days, and see
> whether doing it that way is as easy as it seems on the face of it.

Oracle's TABLE variables in PL/SQL are quite similar; it might be worth
observing how they work for comparison.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-24 Thread Peter Geoghegan

On Fri, Sep 19, 2014 at 2:54 PM, Peter Geoghegan  wrote:
> Probably not - it appears to make very little difference to
> unoptimized pass-by-reference types whether or not datum1 can be used
> (see my simulation of Kevin's worst case, for example [1]). Streaming
> through a not inconsiderable proportion of memtuples again is probably
> a lot worse. The datum1 optimization (which is not all that old) made
> a lot of sense when initially introduced, because it avoided chasing
> through a pointer for pass-by-value types. I think that's its sole
> justification, though.

Just to be clear -- I am blocked on this. Do you really prefer to
restart copying heap tuples from scratch in the event of aborting,
just to make sure that the datum1 representation is consistently
either a pointer to text, or an abbreviated key? I don't think that
the costs involved make that worth it, as I've said, but I'm not sure
how to resolve that controversy.

I suggest that we focus on making sure the abort logic itself is
sound. There probably hasn't been enough discussion of that. Once that
is resolved, we can revisit the question of whether or not copying
should restart to keep the datum1 representation consistent. I suspect
that leaving that until later will be easier all around.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: rounding up time value less than its unit.

2014-09-24 Thread Peter Eisentraut

On 9/23/14 11:55 PM, Gregory Smith wrote:
> Right now there are people out there who have configurations that look
> like this:
> 
> log_rotation_age=60
> 
> In order to get hourly rotation.  These users will suffer some trauma
> should they upgrade to a version where this parameter now means a new
> log file pops every 60 seconds instead.

But then this proposal is just one of several others that break backward
compatibility, and does so in a more or less silent way.  Then we might
as well pick another approach that gets closer to the root of the problem.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing isinf declaration on solaris

2014-09-24 Thread Peter Eisentraut

On 9/24/14 4:26 PM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 9/24/14 9:21 AM, Tom Lane wrote:
>>> Agreed, but what about non-GCC compilers?
> 
>> Stick AC_PROG_CC_C99 into configure.in.
> 
> I think that's a bad idea, unless you mean to do it only on Solaris.
> If we do that unconditionally, we will pretty much stop getting any
> warnings about C99-isms on modern platforms.  I am not aware that
> there has been any agreement to move our portability goalposts up
> to C99.

I don't disagree with that concern.  But let's look at it this way.

isinf() is a C99 function.  If we want to have it, the canonical way is
to put the compiler into C99 mode.  Anything else is just pure luck
(a.k.a. GNU extension).  It's conceivable that on other platforms we
fail to detect a system isinf() function because of that.  The only way
we learned about this is because the current configure check is
inconsistent on Solaris.

The first thing to fix is the configure check.  It shouldn't detect a
function if it creates a warning when used.

What will happen then is probably that isinf() isn't detected on
Solaris, and we use the fallback implementation.  Are we happy with
that?  Maybe.

If not, well, then we need to put the compiler into C99 mode.  But then
it's not a Solaris-specific problem anymore, because Solaris will then
behave like any other platform in this regard.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Sloppy thinking about leakproof properties of opclass co-members

2014-09-24 Thread Tom Lane

It strikes me that there's a significant gap in the whole "leakproof
function" business, namely that no consideration has been given to
planner-driven transformations of queries.  As an example, if we
have "a = b" and "b = c", the planner may generate and apply "a = c"
instead of one or both of those clauses.  If a, b, c are not all the
same type, "a = c" might involve an operator that's not either of the
original ones.  And it's possible that that operator is not leakproof
where the original ones are.  This could easily result in introducing
non-leakproof operations into a secure subquery after pushdown of a
clause that was marked secure.

Another example is that in attempting to make implication or refutation
proofs involving operator clauses, the planner feels free to apply other
members of the operator's btree opclass (if it's in one).  I've not
bothered to try to create a working exploit, but I'm pretty sure that
this could result in a non-leakproof function being applied during
planning of a supposedly secure subquery.  It might be that that couldn't
leak anything worse than constant values within the query tree, but
perhaps it could leak data values from a protected table's pg_statistic
entries.

ISTM that the most appropriate solution here is to insist that all or none
of the members of an operator class be marked leakproof.  (Possibly we
could restrict that to btree opclasses, but I'm not sure any exception is
needed for other index types.)  I looked for existing violations of this
precept, and unfortunately found a *lot*.  For example, texteq is marked
leakproof but its fellow text comparison operators aren't.  Is that really
sane?

Here's a query to find problematic cases:

select p1.proname,o1.oprname,p2.proname,o2.oprname,a1.amopfamily from
pg_proc p1, pg_operator o1, pg_amop a1,
pg_proc p2, pg_operator o2, pg_amop a2
where
p1.oid = o1.oprcode and p2.oid = o2.oprcode and
o1.oid = a1.amopopr and o2.oid = a2.amopopr and
a1.amopfamily = a2.amopfamily and
p1.proleakproof and not p2.proleakproof;

Oh ... and just to add insult to injury, the underlying opclass support
functions (such as textcmp and hashtext) are generally not marked
leakproof even when the operators they may be executed instead of are.
This is even more silly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] interval typmodout is broken

2014-09-24 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > I just noticed when working on DDL deparsing that the typmodout routine
> > for intervals is broken.  The code uses
> 
> > if (precision != INTERVAL_FULL_PRECISION)
> > snprintf(res, 64, "%s(%d)", fieldstr, precision);
> > else
> > snprintf(res, 64, "%s", fieldstr);
> 
> > which puts the parenthised number after the textual name; but the
> > grammar only takes it the other way around.
> 
> You sure about that?  The grammar for INTERVAL is weird.  I believe
> the output we're trying to produce here is something like
> 
>   INTERVAL HOUR TO SECOND(2)
> 
> where "fieldstr" would be " HOUR TO SECOND" and "precision" would
> give the fractional-second precision.

Well, I tested what is taken on input, and yes I agree the grammar is
weird (but not more weird than timestamp/timestamptz, mind).  The input
function only accepts the precision just after the INTERVAL keyword, not
after the fieldstr:

alvherre=# create table str (a interval(2) hour to minute);
CREATE TABLE

alvherre=# create table str2 (a interval hour to minute(2));
ERROR:  syntax error at or near "("
LÍNEA 1: create table str2 (a interval hour to minute(2));
 ^



-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of GetUserId() Usage

2014-09-24 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> I think the case for pgstat_get_backend_current_activity() and
> pg_stat_get_activity and the other pgstatfuncs.c callers is easy to make
> and seems acceptable to me; but I would leave pg_signal_backend out of
> that discussion, because it has a potentially harmful side effect.  By
> requiring SET ROLE you add an extra layer of protection against
> mistakes.  (Hopefully, pg_signal_backend() is not a routine thing for
> well-run systems, which means human intervention, and therefore the room
> for error isn't insignificant.)

While I certainly understand where you're coming from, I don't really
buy into it.  Yes, cancelling a query (the only thing normal users can
do anyway- they can't terminate backends) could mean the loss of any
in-progress work, but it's not like 'rm' and I don't see that it needs
to require extra hoops for individuals to go through.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] interval typmodout is broken

2014-09-24 Thread Tom Lane

Alvaro Herrera  writes:
> I just noticed when working on DDL deparsing that the typmodout routine
> for intervals is broken.  The code uses

>   if (precision != INTERVAL_FULL_PRECISION)
>   snprintf(res, 64, "%s(%d)", fieldstr, precision);
>   else
>   snprintf(res, 64, "%s", fieldstr);

> which puts the parenthised number after the textual name; but the
> grammar only takes it the other way around.

You sure about that?  The grammar for INTERVAL is weird.  I believe
the output we're trying to produce here is something like

INTERVAL HOUR TO SECOND(2)

where "fieldstr" would be " HOUR TO SECOND" and "precision" would
give the fractional-second precision.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing isinf declaration on solaris

2014-09-24 Thread Alvaro Herrera

Tom Lane wrote:
> Peter Eisentraut  writes:
> > On 9/24/14 9:21 AM, Tom Lane wrote:
> >> Agreed, but what about non-GCC compilers?
> 
> > Stick AC_PROG_CC_C99 into configure.in.
> 
> I think that's a bad idea, unless you mean to do it only on Solaris.
> If we do that unconditionally, we will pretty much stop getting any
> warnings about C99-isms on modern platforms.  I am not aware that
> there has been any agreement to move our portability goalposts up
> to C99.

AFAIK we cannot move all the way to C99, because MSVC doesn't support
it.  Presumably we're okay on the isinf() front because MSVC inherited
it from somewhere else (it's on BSD 4.3 according to my Linux manpage),
but other things are known not to work.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Immediate standby promotion

2014-09-24 Thread Simon Riggs

On 18 September 2014 01:22, Robert Haas  wrote:

>> "fast" promotion was actually a supported option in r8 of Postgres but
>> this option was removed when we implemented streaming replication in
>> r9.0
>>
>> The *rough* requirement is sane, but that's not the same thing as
>> saying this exact patch makes sense.
>
> Granted.  Fair point.
>
>> If you are paused and you can see that WAL up ahead is damaged, then
>> YES, you do want to avoid applying it. That is possible by setting a
>> PITR target so that recovery stops at a precise location specified by
>> you. As an existing option is it better than the blunt force trauma
>> suggested here.
>
> You can pause at a recovery target, but then what if you want to go
> read/write at that point?  Or what if you've got a time-delayed
> standby and you want to break replication so that it doesn't replay
> the DROP TABLE students that somebody ran on the master?  It doesn't
> have to be that WAL is unreadable or corrupt; it's enough for it to
> contain changes you wish to avoid replaying.
>
>> If you really don't care, just shutdown server, resetxlog and start
>> her up - again, no need for new option.
>
> To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
> seems like a friendlier interface than making somebody shut down the
> server, run pg_resetxlog, and start it up again.

It makes sense to go from paused --> promoted.

It doesn't make sense to go from normal running --> promoted, since
that is just random data loss. I very much understand the case where
somebody is shouting "get the web site up, we are losing business".
Implementing a feature that allows people to do exactly what they
asked (go live now), but loses business transactions that we thought
had been safely recorded is not good. It implements only the exact
request, not its actual intention.

Any feature that lumps both cases together is wrongly designed and
will cause data loss.

We go to a lot of trouble to ensure data is successfully on disk and
in WAL. I won't give that up, nor do I want to make it easier to lose
data than it already is.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identify_locking_dependencies is broken for schema-only dumps

2014-09-24 Thread Tom Lane

Robert Haas  writes:
> On Wed, Sep 24, 2014 at 12:18 PM, Tom Lane  wrote:
>> Sorry for not paying attention sooner.  After studying it for awhile,
>> I think the change is probably all right but your proposed comment is
>> entirely inadequate.

> If you don't like that version, can you suggest something you would like 
> better?

Perhaps like this:

 * We assume the entry requires exclusive lock on each TABLE or TABLE DATA
 * item listed among its dependencies.  Originally all of these would have
 * been TABLE items, but repoint_table_dependencies would have repointed
 * them to the TABLE DATA items if those are present (which they might not
 * be, eg in a schema-only dump).  Note that all of the entries we are
 * processing here are POST_DATA; otherwise there might be a significant
 * difference between a dependency on a table and a dependency on its
 * data, so that closer analysis would be needed here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS feature has been committed

2014-09-24 Thread Stephen Frost

Heikki,

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> Some random comments after a quick read-through of the patch:

Many thanks for this, again.  I've pushed updates along with the fix for
relcache which was identified by the buildfarm.

Thanks,

Stephen


signature.asc
Description: Digital signature

[HACKERS] interval typmodout is broken

2014-09-24 Thread Alvaro Herrera

I just noticed when working on DDL deparsing that the typmodout routine
for intervals is broken.  The code uses

if (precision != INTERVAL_FULL_PRECISION)
snprintf(res, 64, "%s(%d)", fieldstr, precision);
else
snprintf(res, 64, "%s", fieldstr);


which puts the parenthised number after the textual name; but the
grammar only takes it the other way around.

This has been wrong since commit 5725b9d9afc8 dated Dec 30 2006, which
introduced the whole notion of type-specific typmod output functions.
I don't understand how come nobody has noticed this in eight years.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing isinf declaration on solaris

2014-09-24 Thread Tom Lane

Peter Eisentraut  writes:
> On 9/24/14 9:21 AM, Tom Lane wrote:
>> Agreed, but what about non-GCC compilers?

> Stick AC_PROG_CC_C99 into configure.in.

I think that's a bad idea, unless you mean to do it only on Solaris.
If we do that unconditionally, we will pretty much stop getting any
warnings about C99-isms on modern platforms.  I am not aware that
there has been any agreement to move our portability goalposts up
to C99.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing isinf declaration on solaris

2014-09-24 Thread Peter Eisentraut

On 9/24/14 9:21 AM, Tom Lane wrote:
> Oskari Saarenmaa  writes:
>> ... so to enable XPG6 we'd need to use C99 mode anyway.
> 
> OK.
> 
>> Could we just use 
>> -std=gnu99 (with -fgnu89-inline if required) with GCC on Solaris?  ISTM 
>> it would be cleaner to just properly enable c99 mode rather than define 
>> an undocumented macro to use a couple of c99 declarations.
> 
> Agreed, but what about non-GCC compilers?

Stick AC_PROG_CC_C99 into configure.in.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identify_locking_dependencies is broken for schema-only dumps

2014-09-24 Thread Robert Haas

On Wed, Sep 24, 2014 at 12:18 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> If there are no comments on this soon-ish, I'm going to push and
>> back-patched the patch I attached.
>
> Sorry for not paying attention sooner.  After studying it for awhile,
> I think the change is probably all right but your proposed comment is
> entirely inadequate.  There are extremely specific reasons why this
> works, and you removed an existing comment about that and replaced it
> with nothing but a wishy-washy "maybe".

Well, I could write something like this:

* We assume the item requires exclusive lock on each TABLE or TABLE DATA
* item listed among its dependencies.  (This was originally a dependency on
* the TABLE, but fix_dependencies may have repointed it to the data item.
* In a schema-only dump, however, this will not have been done.)

If you don't like that version, can you suggest something you would like better?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-09-24 Thread Andres Freund

On 2014-09-09 17:54:03 -0400, Robert Haas wrote:
> So, that's committed, then. I think we should pick something that uses
> spinlocks and is likely to fail spectacularly if we haven't got this
> totally right yet, and de-volatilize it.  And then watch to see what
> turns red in the buildfarm and/or which users start screaming.  I'm
> inclined to propose lwlock.c as a candidate, since that's very widely
> used and a place where we know there's significant contention.

Did you consider removing the volatiles from bufmgr.c? There's lots of
volatiles in there and most of them don't seem to have been added in a
principled way. I'm looking at my old patch for lockless pin/unpin of
buffers and it'd look a lot cleaner without.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 09:34 PM, Fabien COELHO wrote:



The idea of a modulo operator was not rejected, we'd just like to have the
infrastructure in place first.


Sigh.

How to transform a trivial 10 lines patch into a probably 500+ lines
project involving flex & bison & some non trivial data structures, and
which may get rejected on any ground...


That's how we get good features. It's very common for someone to need X, 
and to post a patch that does X. Other people pop up that need Y and Z 
which are similar, and you end up implementing those too. It's more work 
for you in the short term, but it benefits everyone in the long run.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] json_object_agg return on no rows

2014-09-24 Thread Andrew Dunstan

Probably due to an oversight on my part, json_object_agg currently 
returns a json object with no fields rather than NULL, which the docs 
say it will, and which would be  consistent with all other aggregates 
except count().


I don't think we ever discussed this, so it's probably just a slip up.

I'm going to change it to return NULL in this case unless there is some 
objection.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-24 Thread Andres Freund

On 2014-09-24 14:28:18 -0400, Tom Lane wrote:
> Note that the spinlock code separates s_lock.h (hardware implementations)
> from spin.h (a hardware-independent abstraction layer).  Perhaps there's
> room for a similar separation here.

Luckily that separation exists ;). The hardware dependant bits are in
different files than the platform independent functionality on top of
them (atomics/generic.h).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-24 Thread Fabien COELHO



The idea of a modulo operator was not rejected, we'd just like to have the 
infrastructure in place first.


Sigh.

How to transform a trivial 10 lines patch into a probably 500+ lines 
project involving flex & bison & some non trivial data structures, and 
which may get rejected on any ground...


Maybe I'll set that as a student project.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-24 Thread Tom Lane

Heikki Linnakangas  writes:
> On 09/24/2014 07:57 PM, Andres Freund wrote:
>> On 2014-09-24 12:44:09 -0400, Tom Lane wrote:
>>> I think the question is more like "what in the world happened to confining
>>> ourselves to a small set of atomics".

>> I fail to see why the existance of a wrapper around compare-exchange
>> (which is one of the primitives we'd agreed upon) runs counter to
>> the agreement that we'll only rely on a limited number of atomics on the
>> hardware level?

> It might be a useful function, but if there's no hardware implementation 
> for it, it doesn't belong in atomics.h. We don't want to turn it into a 
> general library of useful little functions.

Note that the spinlock code separates s_lock.h (hardware implementations)
from spin.h (a hardware-independent abstraction layer).  Perhaps there's
room for a similar separation here.  I tend to agree with Heikki that
wrappers around compare-exchange ought not be conflated with
compare-exchange itself, even if there might theoretically be
architectures where the wrapper function could be implemented directly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-24 Thread Andres Freund

On 2014-09-24 21:19:06 +0300, Heikki Linnakangas wrote:
> On 09/24/2014 07:57 PM, Andres Freund wrote:
> >On 2014-09-24 12:44:09 -0400, Tom Lane wrote:
> >>Andres Freund  writes:
> >>>On 2014-09-24 18:55:51 +0300, Heikki Linnakangas wrote:
> There doesn't seem to be any hardware implementations of that in the 
> patch.
> Is there any architecture that has an instruction or compiler intrinsic 
> for
> that?
> >>
> >>>You can implement it rather efficiently on ll/sc architectures. But I
> >>>don't really think it matters. I prefer add_until (I've seen it named
> >>>saturated add before as well) to live in the atomics code, rather than
> >>>reimplement it in atomics employing code. I guess you see that
> >>>differently?
> >>
> >>I think the question is more like "what in the world happened to confining
> >>ourselves to a small set of atomics".
> >
> >I fail to see why the existance of a wrapper around compare-exchange
> >(which is one of the primitives we'd agreed upon) runs counter to
> >the agreement that we'll only rely on a limited number of atomics on the
> >hardware level?
> 
> It might be a useful function, but if there's no hardware implementation for
> it, it doesn't belong in atomics.h. We don't want to turn it into a general
> library of useful little functions.

Uh. It belongs there because it *atomically* does a saturated add?
That's precisely what atomics.h is about, so I don't see why it'd belong
anywhere else.

> >>I doubt either that this exists
> >>natively anywhere, or ethat it's so useful that we should expect platforms
> >>to have efficient implementations.
> 
> Googling around, ARM seems to have a QADD instruction that does that. But
> AFAICS it's not an atomic instruction that you could use for synchronization
> purposes, just a regular instruction.

On ARM - and many other LL/SC architectures - you can implement it
atomically using LL/SC. In pseudocode:

while (true)
{
load_linked(mem, reg);
if (reg + add < limit)
reg = reg + add;
else
reg = limit;
if (store_conditional(mem, reg))
break
}

That's how pretty much *all* atomic instructions work on such architectures.

> >It's useful for my work to get rid of most LockBufHdr() calls (to
> >manipulate usagecount locklessly). That's why I added it. We can delay
> >it till that patch is ready, but I don't really see the benefit.
> 
> Yeah, please leave it out for now, we can argue about it later. Even if we
> want it in the future, it would be just dead, untested code now.

Ok, will remove it for now.

I won't repost a version with it removed, as removing a function as the
only doesn't seem to warrant reposting it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-24 Thread Fabien COELHO





No, it depends totally on the application.  For financial and
physical inventory purposes where I have had occasion to use it,
the properties which were important were:
[...]


Hmmm. Probably I'm biased towards my compiler with an integer linear 
flavor field, where C-like "%" is always a pain, however you look at it.


I'm not sure of physical inventories with negative numbers though. In 
accounting, I thought that a negative number was a positive number with 
"debit" written above. In finance, no problem to get big deficits:-)


Now about the use case, I'm not sure that you would like to write your 
financial and physical inventory stuff within a pgbench test script, 
whereas in such a script I do expect when doing a modulo with the size of 
a table to have a positive result so that I can expect to find a tuple 
there, hence the desired "positive remainder" property for negative 
dividends.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 07:57 PM, Andres Freund wrote:

On 2014-09-24 12:44:09 -0400, Tom Lane wrote:

Andres Freund  writes:

On 2014-09-24 18:55:51 +0300, Heikki Linnakangas wrote:

There doesn't seem to be any hardware implementations of that in the patch.
Is there any architecture that has an instruction or compiler intrinsic for
that?



You can implement it rather efficiently on ll/sc architectures. But I
don't really think it matters. I prefer add_until (I've seen it named
saturated add before as well) to live in the atomics code, rather than
reimplement it in atomics employing code. I guess you see that
differently?


I think the question is more like "what in the world happened to confining
ourselves to a small set of atomics".


I fail to see why the existance of a wrapper around compare-exchange
(which is one of the primitives we'd agreed upon) runs counter to
the agreement that we'll only rely on a limited number of atomics on the
hardware level?


It might be a useful function, but if there's no hardware implementation 
for it, it doesn't belong in atomics.h. We don't want to turn it into a 
general library of useful little functions.



I doubt either that this exists
natively anywhere, or ethat it's so useful that we should expect platforms
to have efficient implementations.


Googling around, ARM seems to have a QADD instruction that does that. 
But AFAICS it's not an atomic instruction that you could use for 
synchronization purposes, just a regular instruction.



It's useful for my work to get rid of most LockBufHdr() calls (to
manipulate usagecount locklessly). That's why I added it. We can delay
it till that patch is ready, but I don't really see the benefit.


Yeah, please leave it out for now, we can argue about it later. Even if 
we want it in the future, it would be just dead, untested code now.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-24 Thread Tom Lane

Kevin Grittner  writes:
> Assuming that all values are integers, for:

>   x = a / b;
>   y = a % b;

>   If b is zero either statement must generate an error.
>   If a and b have the same sign, x must be positive; else x must be negative.
>   It must hold that abs(x) is equal to abs(a) / abs(b).
>   It must hold that ((x * b) + y) is equal to a.

Not sure about the third of those statements, but the last one is
definitely a requirement.

I think the only defensible choice, really, is that % should be defined
so that a = ((a / b) * b) + (a % b).  It is perfectly reasonable to
provide other division/modulus semantics as functions, preferably in
matching pairs that also satisfy this axiom.  But the two operators need
to agree, or you'll have surprised users.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-24 Thread Kevin Grittner

Fabien COELHO  wrote:

>>> So my opinion is that this small modulo operator patch is both useful and
>>> harmless, so it should be committed.
>>
>> You've really failed to make that case --- in particular, AFAICS there is
>> not even consensus on the exact semantics that the operator should have.
>
> There is. Basically whatever with a positive remainder when the divisor is
> positive is fine. The only one to avoid is the dividend signed version,
> which happen to be the one of C and SQL, a very unfortunate choice in both
> case as soon as you have negative numbers.

No, it depends totally on the application.  For financial and
physical inventory purposes where I have had occasion to use it,
the properties which were important were:

Assuming that all values are integers, for:

  x = a / b;
  y = a % b;

  If b is zero either statement must generate an error.
  If a and b have the same sign, x must be positive; else x must be negative.
  It must hold that abs(x) is equal to abs(a) / abs(b).
  It must hold that ((x * b) + y) is equal to a.

This is exactly what the languages I was using did, and I was glad.
I find it convenient that C and SQL behave this way.  You are
proposing that these not all hold, which in a lot of situations
could cause big problems.  You seem familiar enough with your own
use case that I believe you when you say you don't want these
semantics, but that just points out the need to support both.

>> Then we could add a modulo operator with whatever semantics seem
>> most popular, and some function(s) for the other semantics, and
>> there would not be so much riding on choosing the "right"
> semantics.
>
> The semantics is clear.

I disagree.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-24 Thread Andres Freund

On 2014-09-24 12:44:09 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-09-24 18:55:51 +0300, Heikki Linnakangas wrote:
> >> There doesn't seem to be any hardware implementations of that in the patch.
> >> Is there any architecture that has an instruction or compiler intrinsic for
> >> that?
> 
> > You can implement it rather efficiently on ll/sc architectures. But I
> > don't really think it matters. I prefer add_until (I've seen it named
> > saturated add before as well) to live in the atomics code, rather than
> > reimplement it in atomics employing code. I guess you see that
> > differently?
> 
> I think the question is more like "what in the world happened to confining
> ourselves to a small set of atomics". 

I fail to see why the existance of a wrapper around compare-exchange
(which is one of the primitives we'd agreed upon) runs counter to
the agreement that we'll only rely on a limited number of atomics on the
hardware level?

> I doubt either that this exists
> natively anywhere, or ethat it's so useful that we should expect platforms
> to have efficient implementations.

It's useful for my work to get rid of most LockBufHdr() calls (to
manipulate usagecount locklessly). That's why I added it. We can delay
it till that patch is ready, but I don't really see the benefit.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

2014-09-24 Thread Andrew Gierth

> "Heikki" == Heikki Linnakangas  writes:

 Heikki> There's been a lot of discussion and I haven't followed it in
 Heikki> detail. Andrew, there were some open questions, but have you
 Heikki> gotten enough feedback so that you know what to do next?

I was holding off on posting a recut patch with the latest EXPLAIN
formatting changes (which are basically cosmetic) until it became
clear whether RLS was likely to be reverted or kept (we have a tiny
but irritating conflict with it, in the regression test schedule file
where we both add to the same list of tests).

Other than that there is nothing for Atri and me to do next but wait
on a proper review. The feedback and discussion has been almost all
about cosmetic details; the only actual issues found have been a
trivial omission from pg_stat_statements, and a slightly suboptimal
planning of sort steps, both long since fixed.

What we have not had:

 - anything more than a superficial review

 - any feedback over the acceptability of our chained-sorts approach
   for doing aggregations with differing sort orders

 - any decision about the question of reserved words and/or possibly
   renaming contrib/cube (and what new name to use if so)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-24 Thread Tom Lane

Andres Freund  writes:
> On 2014-09-24 18:55:51 +0300, Heikki Linnakangas wrote:
>> There doesn't seem to be any hardware implementations of that in the patch.
>> Is there any architecture that has an instruction or compiler intrinsic for
>> that?

> You can implement it rather efficiently on ll/sc architectures. But I
> don't really think it matters. I prefer add_until (I've seen it named
> saturated add before as well) to live in the atomics code, rather than
> reimplement it in atomics employing code. I guess you see that
> differently?

I think the question is more like "what in the world happened to confining
ourselves to a small set of atomics".  I doubt either that this exists
natively anywhere, or that it's so useful that we should expect platforms
to have efficient implementations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identify_locking_dependencies is broken for schema-only dumps

2014-09-24 Thread Tom Lane

Robert Haas  writes:
> If there are no comments on this soon-ish, I'm going to push and
> back-patched the patch I attached.

Sorry for not paying attention sooner.  After studying it for awhile,
I think the change is probably all right but your proposed comment is
entirely inadequate.  There are extremely specific reasons why this
works, and you removed an existing comment about that and replaced it
with nothing but a wishy-washy "maybe".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-24 Thread Andres Freund

On 2014-09-24 18:55:51 +0300, Heikki Linnakangas wrote:
> On 09/24/2014 03:37 PM, Andres Freund wrote:
>  >+/*
>  >+ * pg_fetch_add_until_u32 - saturated addition to variable
>  >+ *
>  >+ * Returns the the value of ptr after the arithmetic operation.
>  >+ *
>  >+ * Full barrier semantics.
>  >+ */
>  >+STATIC_IF_INLINE uint32
>  >+pg_atomic_fetch_add_until_u32(volatile pg_atomic_uint32 *ptr, int32 
>  >add_,
>  >+ uint32 until)
>  >+{
>  >+   CHECK_POINTER_ALIGNMENT(ptr, 4);
>  >+   return pg_atomic_fetch_add_until_u32_impl(ptr, add_, until);
>  >+}
>  >+
> >>>
> >>>This was a surprise to me, I don't recall discussion of an 
> >>>"fetch-add-until"
> >>>operation, and hadn't actually ever heard of it before.
> >It was included from the first version on, and I'd mentioned it a couple
> >times.
> 
> There doesn't seem to be any hardware implementations of that in the patch.
> Is there any architecture that has an instruction or compiler intrinsic for
> that?

You can implement it rather efficiently on ll/sc architectures. But I
don't really think it matters. I prefer add_until (I've seen it named
saturated add before as well) to live in the atomics code, rather than
reimplement it in atomics employing code. I guess you see that
differently?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 03:37 PM, Andres Freund wrote:

> >+/*
> >+ * pg_fetch_add_until_u32 - saturated addition to variable
> >+ *
> >+ * Returns the the value of ptr after the arithmetic operation.
> >+ *
> >+ * Full barrier semantics.
> >+ */
> >+STATIC_IF_INLINE uint32
> >+pg_atomic_fetch_add_until_u32(volatile pg_atomic_uint32 *ptr, int32 add_,
> >+uint32 until)
> >+{
> >+  CHECK_POINTER_ALIGNMENT(ptr, 4);
> >+  return pg_atomic_fetch_add_until_u32_impl(ptr, add_, until);
> >+}
> >+

>
>This was a surprise to me, I don't recall discussion of an "fetch-add-until"
>operation, and hadn't actually ever heard of it before.

It was included from the first version on, and I'd mentioned it a couple
times.


There doesn't seem to be any hardware implementations of that in the 
patch. Is there any architecture that has an instruction or compiler 
intrinsic for that?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-09-24 Thread Robert Haas

On Wed, Sep 17, 2014 at 7:40 PM, Kouhei Kaigai  wrote:
> At this moment, I revised the above portion of the patches.
> create_custom_plan() was modified to call "PlanCustomPath" callback
> next to the initialization of tlist and clauses.
>
> It's probably same as what you suggested.

create_custom_plan() is mis-named.  It's actually only applicable to
the custom-scan case, because it's triggered by create_plan_recurse()
getting a path node with a T_CustomScan pathtype.  Now, we could
change that; although in general create_plan_recurse() dispatches on
pathtype, we could make CustomPath an exception; the top of that
function could say if (IsA(best_path, CustomPath)) { /* do custom
stuff */ }.  But the problem with that idea is that, when the custom
path is specifically a custom scan, rather than a join or some other
thing, you want to do all of the same processing that's in
create_scan_plan().

So I think what should happen is that create_plan_recurse() should
handle T_CustomScan the same way it handles T_SeqScan, T_IndexScan, et
al: by calling create_scan_plan().  The switch inside that function
can then call a function create_customscan_plan() if it sees
T_CustomScan.  And that function will be simpler than the
create_custom_plan() that you have now, and it will be named
correctly, too.

In ExplainNode(), I think sname should be set to "Custom Scan", not
"Custom".  And further down, the custom_name should be printed as
"Custom Plan Provider" not just "Custom".

setrefs.c has remaining handling for the scanrelid = 0 case; please remove that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] src/backend/replication/Makefile misses -I.

2014-09-24 Thread Andres Freund

Hi,

Because of the atomics patch I was building postgres with sun
studio. Turns out vpath builds don't work in that scenario when building
from git. The problem is that the replication Makefile
override CPPFLAGS := -I$(srcdir) $(CPPFLAGS)
includes the source directory, but not the current directory. The other
Makefiles dealing with similar things do:
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
which looks right to me.

The current override line is from 9cc2c182fc20d5 in reaction to #6073.

Unless somebody protests I'm going to backpatch the -I. addition back to
9.1.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-09-24 Thread Ants Aasma

On Tue, Sep 23, 2014 at 8:15 PM, Florian Weimer  wrote:
> * Ants Aasma:
>
>> CRC has exactly one hardware implementation in general purpose CPU's
>
> I'm pretty sure that's not true.  Many general purpose CPUs have CRC
> circuity, and there must be some which also expose them as
> instructions.

I must eat my words here, indeed AMD processors starting from
Bulldozer do implement the CRC32 instruction. However, according to
Agner Fog, AMD's implementation has a 6 cycle latency and more
importantly a throughput of 1/6 per cycle. While Intel's
implementation on all CPUs except the new Atom has 3 cycle latency and
1 instruction/cycle throughput. This means that there still is a
significant handicap for AMD platforms, not to mention Power or Sparc
with no hardware support. Some ARM's implement CRC32, but I haven't
researched what their performance is.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Abhijit Menon-Sen

Updated patches attached.

-- Abhijit
>From b3bd465357f96ebf1953b3a98f25fb51bac5eb26 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Wed, 24 Sep 2014 16:26:00 +0530
Subject: Make pg_controldata ignore a -D before DataDir

---
 doc/src/sgml/ref/pg_controldata.sgml|  5 +++--
 src/bin/pg_controldata/pg_controldata.c | 13 +++--
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_controldata.sgml b/doc/src/sgml/ref/pg_controldata.sgml
index fbf40fc..b4be660 100644
--- a/doc/src/sgml/ref/pg_controldata.sgml
+++ b/doc/src/sgml/ref/pg_controldata.sgml
@@ -40,8 +40,9 @@ PostgreSQL documentation
   
This utility can only be run by the user who initialized the cluster because
it requires read access to the data directory.
-   You can specify the data directory on the command line, or use
-   the environment variable PGDATA.  This utility supports the options
+   You can specify the data directory on the command line, with or without
+   -D or use the environment variable PGDATA.
+   This utility supports the options
-V and --version, which print the
pg_controldata version and exit.  It also
supports options -? and --help, which output the
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index f815024..cfa6911 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -31,14 +31,19 @@
 static void
 usage(const char *progname)
 {
+	const char *pgdata;
+
+	pgdata = getenv("PGDATA");
+	if (!pgdata)
+		pgdata = "";
+
 	printf(_("%s displays control information of a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
+	printf(_("  -D DATADIR data directory (default: \"%s\")\n"), pgdata);
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
-	printf(_("\nIf no data directory (DATADIR) is specified, "
-			 "the environment variable PGDATA\nis used.\n\n"));
 	printf(_("Report bugs to .\n"));
 }
 
@@ -120,7 +125,11 @@ main(int argc, char *argv[])
 	}
 
 	if (argc > 1)
+	{
 		DataDir = argv[1];
+		if (strcmp(DataDir, "-D") == 0 && argc > 2)
+			DataDir = argv[2];
+	}
 	else
 		DataDir = getenv("PGDATA");
 	if (DataDir == NULL)
-- 
1.9.1

>From 51c651a24db4f3b977fa38c08177acc062a5fb56 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Wed, 24 Sep 2014 16:48:36 +0530
Subject: Make pg_resetxlog also accept -D datadir

---
 doc/src/sgml/ref/pg_resetxlog.sgml  |  6 --
 src/bin/pg_resetxlog/pg_resetxlog.c | 16 +++-
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 0b53bd6..7b2386a 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
   
pg_resetxlog
+   -D datadir
-f
-n
-o oid
@@ -30,7 +31,7 @@ PostgreSQL documentation
-m mxid,mxid
-O mxoff
-l xlogfile
-   datadir
+   datadir
   
  
 
@@ -55,7 +56,8 @@ PostgreSQL documentation
   
This utility can only be run by the user who installed the server, because
it requires read/write access to the data directory.
-   For safety reasons, you must specify the data directory on the command line.
+   For safety reasons, you must specify the data directory on the command line
+   (with or without -D).
pg_resetxlog does not use the environment variable
PGDATA.
   
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 302d005..9364e49 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -89,7 +89,7 @@ main(int argc, char *argv[])
 	MultiXactId set_oldestmxid = 0;
 	char	   *endptr;
 	char	   *endptr2;
-	char	   *DataDir;
+	char	   *DataDir = NULL;
 	int			fd;
 
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_resetxlog"));
@@ -111,10 +111,14 @@ main(int argc, char *argv[])
 	}
 
 
-	while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1)
+	while ((c = getopt(argc, argv, "D:fl:m:no:O:x:e:")) != -1)
 	{
 		switch (c)
 		{
+			case 'D':
+DataDir = optarg;
+break;
+
 			case 'f':
 force = true;
 break;
@@ -233,7 +237,7 @@ main(int argc, char *argv[])
 		}
 	}
 
-	if (optind == argc)
+	if (DataDir == NULL && optind == argc)
 	{
 		fprintf(stderr, _("%s: no data directory specified\n"), progname);
 		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
@@ -257,7 +261,8 @@ main(int argc, char *argv[])
 	}
 #endif
 
-	DataDir = argv[optind];
+	if (DataDir == NULL)
+		DataDir = argv[optind];
 
 	if (chdir(DataDir) < 0)
 	{
@@ -1077,8 +1082,9 @@ static void
 usage(void)
 {
 	printf(_("%s resets the PostgreSQL transaction log.\n\n"), progname);
-	printf(_("Usage:\n  %s [OPTION]... DATADIR\n\n"), progname);
+	pri

Re: [HACKERS] “Core” function in Postgres

2014-09-24 Thread Craig Ringer

On 09/24/2014 07:29 AM, Mingzhe Li wrote:

> PS: I have the same post on stackoverflow. Since no one answered there,
> I just report here. 

Thanks for mentioning it. In future, please include a link.

You might want to wait more than a couple of hours too ;-)

For reference, the Stack Overflow post is

http://stackoverflow.com/q/26005359/398670 .

where I answered in some detail.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

2014-09-24 Thread Heikki Linnakangas

There's been a lot of discussion and I haven't followed it in detail. 
Andrew, there were some open questions, but have you gotten enough 
feedback so that you know what to do next? I'm trying to get this 
commitfest to an end, and this is still in "Needs Review" state...


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identify_locking_dependencies is broken for schema-only dumps

2014-09-24 Thread Robert Haas

On Fri, Sep 19, 2014 at 11:23 AM, Robert Haas  wrote:
> This can lead to deadlocks during parallel restore.  Test case:
>
> create table bar (a int primary key, b int);
> create table baz (z int, a int references bar);
> create view foo as select a, b, sum(1) from bar group by a union all
> select z, a, 0 from baz;
>
> I dumped this with: pg_dump -Fc -s -f test.dmp
> Then: while (dropdb rhaas; createdb; pg_restore -O -d rhaas -j3
> test.dmp); do true; done
>
> This quickly fails for me with:
>
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 2155; 2606 47822 FK
> CONSTRAINT baz_a_fkey rhaas
> pg_restore: [archiver (db)] could not execute query: ERROR:  deadlock detected
> DETAIL:  Process 81791 waits for AccessExclusiveLock on relation 47862
> of database 47861; blocked by process 81789.
> Process 81789 waits for AccessShareLock on relation 47865 of database
> 47861; blocked by process 81791.
> HINT:  See server log for query details.
> Command was: ALTER TABLE ONLY baz
> ADD CONSTRAINT baz_a_fkey FOREIGN KEY (a) REFERENCES bar(a);
> WARNING: errors ignored on restore: 2
>
> The attached patch seems to fix it for me.
>
> Comments?

If there are no comments on this soon-ish, I'm going to push and
back-patched the patch I attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "Core" function in Postgres

2014-09-24 Thread Merlin Moncure

On Tue, Sep 23, 2014 at 7:02 PM, Michael Paquier
 wrote:
> On Wed, Sep 24, 2014 at 8:29 AM, Mingzhe Li  wrote:
>> Hi experts,
>>
>> I want to know what's the "core" function used in Postgres server? I am
>> looking for something corresponding to main() in a simple C program. I want
>> to know the file path and the function name. I am using Postgres 9.3.5,
>> however I assume the "core" function will be unchanged between different
>> revisions.
>
> In your question, it seems that you are looking for the main() call
> for the binary postgres, which is located in src/backend/main/main.c.
> At the bottom of main() you'll see as well a set of functions like
> PostmasterMain or PostgresMain that really define the startup path
> used. PostmasterMain for example starts the postmaster, that is then
> able to itself start backend process through PostgresMain()...

As noted main is in main.c.  Most of the interesting stuff that
happens in the main execution loop for the backend is in
tcop/postgres.c (tcop being shorthand for 'traffic cop' -- probably
not a great name but it hails from the very early days of the
project).  That kinda answers your other question: these function
names rarely change except as needed to meet new requirements.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Alvaro Herrera

Abhijit Menon-Sen wrote:

> P.S. Trivia: I can't find any other options in Postgres where the
> argument is mandatory but the option is optional.

psql behaves similarly with its -d and -U switches also; note --help:

Usage:
  psql [OPTION]... [DBNAME [USERNAME]]
...
  -d, --dbname=DBNAME  database name to connect to (default: "alvherre")
...
  -U, --username=USERNAME  database user name (default: "alvherre")

Both are optional and have defaults.  Datadir for pg_controldata is also
optional and --help contains this blurb:

If no data directory (DATADIR) is specified, the environment variable PGDATA
is used.

I think you could replace those lines with
  -Ddata directory blah (default: "value of $PGDATA")

But psql --help doesn't use $PGUSER to expand the default value for -U;
I'm not clear if this means we shouldn't expand PGDATA in the help line
for pg_controldata, or need to fix psql to expand PGUSER in -U.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing isinf declaration on solaris

2014-09-24 Thread Oskari Saarenmaa


24.09.2014, 16:21, Tom Lane kirjoitti:

Oskari Saarenmaa  writes:

... so to enable XPG6 we'd need to use C99 mode anyway.


OK.


Could we just use
-std=gnu99 (with -fgnu89-inline if required) with GCC on Solaris?  ISTM
it would be cleaner to just properly enable c99 mode rather than define
an undocumented macro to use a couple of c99 declarations.


Agreed, but what about non-GCC compilers?


Solaris Studio defaults to "-xc99=all,no_lib" which, according to the 
man page, enables c99 language features but doesn't use c99 standard 
library semantics.  isinf is defined to be a macro by c99 and doesn't 
require changing the c99 mode so I'd just keep using the defaults with 
Solaris Studio for now.


For GCC

--- a/src/template/solaris
+++ b/src/template/solaris
@@ -0,0 +1,4 @@
+if test "$GCC" = yes ; then
+  CPPFLAGS="$CPPFLAGS -std=gnu99"
+fi
+

gets rid of the warnings and passes tests.

/ Oskari


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 04:49 PM, Abhijit Menon-Sen wrote:

At 2014-09-24 09:25:12 -0400, t...@sss.pgh.pa.us wrote:


What's so hard about [ -D ] before the datadir argument?


I'm sorry to have given you the impression that I objected to it because
it was hard. Since I proposed the same thing a few lines after what you
quoted, I guess I have to agree it's not hard.

I think it's pointless, because if you're going to look at the usage
message to begin with, why not do what it already says? But I don't
care enough to argue about it any further.


There's also the reverse situation: you see that a script contains a 
line like "pg_controldata -D foo". You're accustomed to doing just 
"pg_controldata foo", and you wonder what the -D option does. So you 
look it up in the docs or "pg_controldata --help".


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Abhijit Menon-Sen

At 2014-09-24 09:25:12 -0400, t...@sss.pgh.pa.us wrote:
>
> What's so hard about [ -D ] before the datadir argument?

I'm sorry to have given you the impression that I objected to it because
it was hard. Since I proposed the same thing a few lines after what you
quoted, I guess I have to agree it's not hard.

I think it's pointless, because if you're going to look at the usage
message to begin with, why not do what it already says? But I don't
care enough to argue about it any further.

Patches attached.

-- Abhijit

P.S. Trivia: I can't find any other options in Postgres where the
argument is mandatory but the option is optional.
>From 69d7386ab59ca9df74af5abe5a5c3cf5a93e64bb Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Wed, 24 Sep 2014 16:26:00 +0530
Subject: Make pg_controldata ignore a -D before DataDir

---
 src/bin/pg_controldata/pg_controldata.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index f815024..4386283 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -33,7 +33,7 @@ usage(const char *progname)
 {
 	printf(_("%s displays control information of a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
-	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("  %s [OPTION] [[-D] DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
@@ -120,7 +120,11 @@ main(int argc, char *argv[])
 	}
 
 	if (argc > 1)
+	{
 		DataDir = argv[1];
+		if (strcmp(DataDir, "-D") == 0 && argc > 2)
+			DataDir = argv[2];
+	}
 	else
 		DataDir = getenv("PGDATA");
 	if (DataDir == NULL)
-- 
1.9.1

>From 388b5009281184051398657449e649ec9585a242 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Wed, 24 Sep 2014 16:48:36 +0530
Subject: Make pg_resetxlog also accept -D datadir

---
 src/bin/pg_resetxlog/pg_resetxlog.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 302d005..8ff5df0 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -89,7 +89,7 @@ main(int argc, char *argv[])
 	MultiXactId set_oldestmxid = 0;
 	char	   *endptr;
 	char	   *endptr2;
-	char	   *DataDir;
+	char	   *DataDir = NULL;
 	int			fd;
 
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_resetxlog"));
@@ -111,10 +111,14 @@ main(int argc, char *argv[])
 	}
 
 
-	while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1)
+	while ((c = getopt(argc, argv, "D:fl:m:no:O:x:e:")) != -1)
 	{
 		switch (c)
 		{
+			case 'D':
+DataDir = optarg;
+break;
+
 			case 'f':
 force = true;
 break;
@@ -233,7 +237,7 @@ main(int argc, char *argv[])
 		}
 	}
 
-	if (optind == argc)
+	if (DataDir == NULL && optind == argc)
 	{
 		fprintf(stderr, _("%s: no data directory specified\n"), progname);
 		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
@@ -257,7 +261,8 @@ main(int argc, char *argv[])
 	}
 #endif
 
-	DataDir = argv[optind];
+	if (DataDir == NULL)
+		DataDir = argv[optind];
 
 	if (chdir(DataDir) < 0)
 	{
@@ -1077,7 +1082,7 @@ static void
 usage(void)
 {
 	printf(_("%s resets the PostgreSQL transaction log.\n\n"), progname);
-	printf(_("Usage:\n  %s [OPTION]... DATADIR\n\n"), progname);
+	printf(_("Usage:\n  %s [OPTION]... [-D] DATADIR\n\n"), progname);
 	printf(_("Options:\n"));
 	printf(_("  -e XIDEPOCH  set next transaction ID epoch\n"));
 	printf(_("  -f   force update to be done\n"));
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 12:22 AM, Kevin Grittner wrote:

Heikki Linnakangas  wrote:


instead of passing parameters to the SPI calls individually, you
invented SPI_register_tuplestore which affects all subsequent SPI
calls.


All subsequent SPI calls on that particular SPI connection until it
is closed, except for any tuplestores are later unregistered.
Nested SPI connections do not automatically inherit the named
tuplestores; whatever code opens an SPI connection would need to
register them for the new context, if desired.  This seemed to me
to provide minimal disruption to the existing SPI callers who might
want to use this.


Yeah, I got that. And note that I'm not saying that's necessarily a bad 
design per se - it's just that it's different from the way parameters 
work, and I don't like it for that reason.


You could imagine doing the same for parameters; have a 
SPI_register_param() function that you could use to register parameter 
types, and the parameters could then be referenced in any SPI calls that 
follow (within the same connection). But as the code stands, SPI is 
stateless wrt. to parameters, and tuplestores or relation parameters 
should follow the lead.



The next question is how to pass the new hooks and tuplestores
However, there doesn't seem to be any way to do one-shot queries
with a ParserSetupHook and ParamListInfo. That seems like an
oversight, and would be nice to address that anyway.


There are dozens of SPI_prepare* and SPI_exec* calls scattered all
over the backend, pl, and contrib code which appear to get along
fine without that.


Yeah. None of the current callers have apparently needed that 
functionality. But it's not hard to imagine cases where it would be 
needed. For example, imagine a variant of EXECUTE '...' where all the 
PL/pgSQL variables can be used in the query, like they can in static 
queries:


declare
  myvar int4;
  tablename text;
begin
  ...
  EXECUTE 'insert into ' || tablename ||' values (myvar)';
end;

Currently you have to use $1 in place of the variable name, and pass the 
variable's current value with USING. If you wanted to make the above 
work, you would need a variant of SPI_execute that can run a one-shot 
query, with a parser-hook.


Whether you want to use a parser-hook or is orthogonal to whether or not 
you want to run a one-shot query or prepare it and keep the plan.



 Partly it may be because it involves something
of a modularity violation; the comment for the function used for
this call (where it *is* used) says:

/*
  * plpgsql_parser_setup set up parser hooks for dynamic parameters
  *
  * Note: this routine, and the hook functions it prepares for, are logically
  * part of plpgsql parsing.  But they actually run during function execution,
  * when we are ready to evaluate a SQL query or expression that has not
  * previously been parsed and planned.
  */


No, that's something completely different. The comment points out that 
even though plpgsql_parser_setup is in pl_comp.c, which contains code 
related to compiling a PL/pgSQL function, it's actually called at 
execution time, not compilation time.



Can you clarify what benefit you see to modifying the SPI API the
way you suggest, and what impact it might have on existing calling
code?


Well, we'll have to keep the existing functions anyway, to avoid 
breaking 3rd party code that use them, so there would be no impact on 
existing code. The benefit would be that you could use the parser hooks 
and the ParamListInfo struct even when doing a one-shot query.


Or perhaps you could just use SPI_prepare_params + 
SPI_execute_plan_with_paramlist even for one-shot queries. There is some 
overhead when a SPIPlan has to be allocated, but maybe it's not big 
enough to worry about. That would be worth measuring before adding new 
functions to the SPI.



PS. the copy/read/write functions for TuplestoreRelation in the
patch are broken; TuplestoreRelation is not a subclass of Plan.


I'm not sure what you mean by "broken" -- could you elaborate?


Sure:


+ /*
+  * _copyTuplestoreRelation
+  */
+ static TuplestoreRelation *
+ _copyTuplestoreRelation(const TuplestoreRelation *from)
+ {
+   TuplestoreRelation   *newnode = makeNode(TuplestoreRelation);
+
+   /*
+* copy node superclass fields
+*/
+   CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+   /*
+* copy remainder of node
+*/
+   COPY_STRING_FIELD(refname);
+
+   return newnode;
+ }


You cast the TuplestoreRelation to Plan, and pass it to CopyPlanFields. 
That will crash, because TuplestoreRelation is nothing like a Plan:



+ /*
+  * TuplestoreRelation -
+  *   synthetic node for tuplestore passed in to the query by name
+  *
+  * This is initially added to support trigger transition tables, but may find
+  * other uses, so we try to keep it generic.
+  */
+ typedef struct TuplestoreRelation
+ {
+   NodeTag type;
+   char   *refname;
+ } TuplestoreRelat

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Tom Lane

Abhijit Menon-Sen  writes:
> At 2014-09-24 16:02:29 +0300, hlinnakan...@vmware.com wrote:
>> Thanks. Please update the docs and usage(), too.

> I'm sorry, but I don't think it would be an improvement to make the docs
> explain that it's valid to use either "-D datadir" or specify it without
> an option.

What's so hard about [ -D ] before the datadir argument?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing isinf declaration on solaris

2014-09-24 Thread Tom Lane

Oskari Saarenmaa  writes:
> ... so to enable XPG6 we'd need to use C99 mode anyway.

OK.

> Could we just use 
> -std=gnu99 (with -fgnu89-inline if required) with GCC on Solaris?  ISTM 
> it would be cleaner to just properly enable c99 mode rather than define 
> an undocumented macro to use a couple of c99 declarations.

Agreed, but what about non-GCC compilers?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Abhijit Menon-Sen

To put it another way, I doubt any of the people who were surprised by
it looked at either the usage message or the docs. ;-)

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing isinf declaration on solaris

2014-09-24 Thread Andres Freund

On 2014-09-24 08:25:34 -0400, Tom Lane wrote:
> I'm worried that __C99FEATURES__ might do other, not-so-C89-compatible
> things in later Solaris releases.  Possibly that risk could be addressed
> by having src/template/solaris make an OS version check before adding the
> switch, but it'd be a bit painful probably.

As we want to actually be able to compile with a C99 compiler I don't
see much harm in that? Many compilers these day have C99 stuff enabled
by default anyway...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Abhijit Menon-Sen

At 2014-09-24 16:02:29 +0300, hlinnakan...@vmware.com wrote:
>
> Thanks. Please update the docs and usage(), too.

I'm sorry, but I don't think it would be an improvement to make the docs
explain that it's valid to use either "-D datadir" or specify it without
an option. If both commands were changed to *require* "-D datadir", that
would be worth documenting. But of course I didn't want to change any
behaviour for people who have a better memory than I do.

I think it's all right as a quick hack to remove an inconsistency that
has clearly annoyed many people, but not much more.

Would you be OK with my just sticking a "[-D]" before DATADIR in the
usage() message for both programs, with no further explanation? (But
to be honest, I don't think even that is necessary or desirable.)

-- Abhijit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing isinf declaration on solaris

2014-09-24 Thread Oskari Saarenmaa


24.09.2014, 15:25, Tom Lane kirjoitti:

Oskari Saarenmaa  writes:

GCC 4.9 build on Solaris 10 shows these warnings about isinf:
float.c: In function 'is_infinite':
float.c:178:2: warning: implicit declaration of function 'isinf'


Ugh.


isinf declaration is in  which is included by ,
but it's surrounded by #if defined(_STDC_C99) || _XOPEN_SOURCE - 0 >=
600 || defined(__C99FEATURES__).  A couple of quick Google searches
suggests that some other projects have worked around this by always
defining __C99FEATURES__ even if compiling in C89 mode.  __C99FEATURES__
is only used by math.h and fenv.h in /usr/include.



Should we just add -D__C99FEATURES__ to CPPFLAGS in
src/template/solaris, add our own declaration of isinf() or do something
else about the warning?


I'm worried that __C99FEATURES__ might do other, not-so-C89-compatible
things in later Solaris releases.  Possibly that risk could be addressed
by having src/template/solaris make an OS version check before adding the
switch, but it'd be a bit painful probably.

Based on the #if you show, I'd be more inclined to think about defining
_XOPEN_SOURCE to get the result.  There is precedent for that in
src/template/hpux which does
CPPFLAGS="$CPPFLAGS -D_XOPEN_SOURCE_EXTENDED"
I forget what-all _XOPEN_SOURCE_EXTENDED enables exactly, but if memory
serves there were a nontrivial number of now-considered-standard features
turned on by that in ancient HPUX releases.  If you want to pursue this
route, you'd need to do a bit of digging to see what else _XOPEN_SOURCE
controls in Solaris and if there is some front-end feature macro (like
_XOPEN_SOURCE_EXTENDED) that you're supposed to set instead of touching
it directly.


Looking at standards(5) and /usr/include/sys/feature_tests.h it looks 
like _XOPEN_SOURCE_EXTENDED enables XPG4v2 environment. 
_XOPEN_SOURCE=600 enables XPG6, but feature_tests.h also has this bit:


/*
 * It is invalid to compile an XPG3, XPG4, XPG4v2, or XPG5 application
 * using c99.  The same is true for POSIX.1-1990, POSIX.2-1992, POSIX.1b,
 * and POSIX.1c applications. Likewise, it is invalid to compile an XPG6
 * or a POSIX.1-2001 application with anything other than a c99 or later
 * compiler.  Therefore, we force an error in both cases.
 */

so to enable XPG6 we'd need to use C99 mode anyway.  Could we just use 
-std=gnu99 (with -fgnu89-inline if required) with GCC on Solaris?  ISTM 
it would be cleaner to just properly enable c99 mode rather than define 
an undocumented macro to use a couple of c99 declarations.


/ Oskari


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 02:21 PM, Abhijit Menon-Sen wrote:

At 2014-09-24 14:03:41 +0300, hlinnakan...@vmware.com wrote:


Ah, I frequently run into that too; but with pg_resetxlog.


Well, that's no fun. Patch attached.


Thanks. Please update the docs and usage(), too.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Thom Brown

On 24 September 2014 12:04, Magnus Hagander  wrote:

> On Wed, Sep 24, 2014 at 12:59 PM, Abhijit Menon-Sen 
> wrote:
> > Hi.
> >
> > I can never remember that pg_controldata takes only a dirname rather
> > than "-D dirname", and I gather I'm not the only one. Here's a tiny
> > patch to make it work either way. As far as I can see, it doesn't
> > break anything, not even if you have a data directory named "-D".
>
> I haven't looked at the code, but definitely +1 for the feature.
> That's really quite annoying.
>

And here I was thinking it was just me.

+1

Thom

Re: [HACKERS] pg_dump bug in 9.4beta2 and HEAD

2014-09-24 Thread Alvaro Herrera

Heikki Linnakangas wrote:
> On 09/24/2014 01:50 PM, Thom Brown wrote:

> >>I am sending two patches
> >>
> >>first is fast fix
> >>
> >>second fix is implementation of Heikki' idea.
> >
> >I'm guessing this issue is still unresolved?  It would be nice to get this
> >off the open items list.
> 
> Yeah, I had completely forgotten about this. Alvaro, could you
> finish this off?

Ah, I had forgotten too.  Sure, will look into it shortly.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Alvaro Herrera

Magnus Hagander wrote:
> On Wed, Sep 24, 2014 at 12:59 PM, Abhijit Menon-Sen  
> wrote:
> > Hi.
> >
> > I can never remember that pg_controldata takes only a dirname rather
> > than "-D dirname", and I gather I'm not the only one. Here's a tiny
> > patch to make it work either way. As far as I can see, it doesn't
> > break anything, not even if you have a data directory named "-D".
> 
> I haven't looked at the code, but definitely +1 for the feature.
> That's really quite annoying.

+1

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-24 Thread Andres Freund

On 2014-09-24 14:59:19 +0300, Heikki Linnakangas wrote:
> On 09/23/2014 12:01 AM, Andres Freund wrote:
> >Hi,
> >
> >I've finally managed to incorporate (all the?) feedback I got for
> >0.5. Imo the current version looks pretty good.
> 
> Thanks! I agree it looks good. Some random comments after a quick
> read-through:
> 
> There are some spurious whitespace changes in spin.c.

Huh. Unsure how they crept in. Will fix.

> >+ * These files can provide the full set of atomics or can do pretty much
> >+ * nothing if all the compilers commonly used on these platforms provide
> >+ * useable generics. It will often make sense to define memory barrier
> >+ * semantics in those, since e.g. the compiler intrinsics for x86 memory
> >+ * barriers can't know we don't need read/write barriers anything more than 
> >a
> >+ * compiler barrier.
> 
> I didn't understand the latter sentence.

Hm. The background here is that postgres doesn't need memory barriers
affect sse et al registers - which is why read/write barriers currently
can be defined to be empty on TSO architectures like x86. But e.g. gcc's
barrier intrinsics don't assume that, so they do a mfence or lock xaddl
for read/write barriers. Which isn't cheap.

Will try to find a way to rephrase.

> >+ * Don't add an inline assembly of the actual atomic operations if all the
> >+ * common implementations of your platform provide intrinsics. Intrinsics 
> >are
> >+ * much easier to understand and potentially support more architectures.
> 
> What about uncommon implementations, then? I think we need to provide inline
> assembly if any supported implementation on the platform does not provide
> intrinsics, otherwise we fall back to emulation. Or is that exactly what
> you're envisioning we do?

Yes, that's what I'm envisioning. E.g. old versions of solaris studio
don't provide compiler intrinsics and also don't provide reliable ways
to barrier semantics. It doesn't seem worthwile to add a inline assembly
version for that special case.

> I believe such a situation hasn't come up on the currently supported
> platforms, so we don't necessary have to have a solution for that, but the
> comment doesn't feel quite right either.

It's the reason I added a inline assembly version of atomics for x86
gcc...

> >+#define CHECK_POINTER_ALIGNMENT(ptr, bndr) \
> >+Assert(TYPEALIGN((uintptr_t)(ptr), bndr))
> 
> Would be better to call this AssertAlignment, to make it clear that this is
> an assertion, not something like CHECK_FOR_INTERRUPTS(). And perhaps move
> this to c.h where other assertions are - this seems generally useful.

Sounds good.

> >+ * Spinloop delay -
> 
> Spurious dash.

Probably should rather add a bit more explanation...

> >+/*
> >+ * pg_fetch_add_until_u32 - saturated addition to variable
> >+ *
> >+ * Returns the the value of ptr after the arithmetic operation.
> >+ *
> >+ * Full barrier semantics.
> >+ */
> >+STATIC_IF_INLINE uint32
> >+pg_atomic_fetch_add_until_u32(volatile pg_atomic_uint32 *ptr, int32 add_,
> >+  uint32 until)
> >+{
> >+CHECK_POINTER_ALIGNMENT(ptr, 4);
> >+return pg_atomic_fetch_add_until_u32_impl(ptr, add_, until);
> >+}
> >+
> 
> This was a surprise to me, I don't recall discussion of an "fetch-add-until"
> operation, and hadn't actually ever heard of it before.

It was included from the first version on, and I'd mentioned it a couple
times.

> None of the subsequent patches seem to use it either. Can we just
> leave this out?

I have a further patch (lockless buffer header manipulation) that uses
it to manipulate the usagecount. I can add it back later if you feel
strongly about it.

> s/the the/the/
> 
> >+#ifndef PG_HAS_ATOMIC_WRITE_U64
> >+#define PG_HAS_ATOMIC_WRITE_U64
> >+static inline void
> >+pg_atomic_write_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 val)
> >+{
> >+/*
> >+ * 64 bit writes aren't safe on all platforms. In the generic
> >+ * implementation implement them as an atomic exchange. That might 
> >store a
> >+ * 0, but only if the prev. value also was a 0.
> >+ */
> >+pg_atomic_exchange_u64_impl(ptr, val);
> >+}
> >+#endif
> 
> Why is 0 special?

That bit should actually be in the read_u64 implementation, not write...

If you do a compare and exchange you have to provide an 'expected'
value. So, if you do a compare and exchange and the previous value is 0
the value will be written superflously - but the actual value won't change.

> >+/*
> >+ * Can't run the test under the semaphore emulation, it doesn't handle
> >+ * checking edge cases well.
> >+ */
> >+#ifndef PG_HAVE_ATOMIC_FLAG_SIMULATION
> >+test_atomic_flag();
> >+#endif
> 
> Huh? Is the semaphore emulation broken, then?

No, it's not. The semaphore emulation doesn't implement
pg_atomic_unlocked_test_flag() (as there's no sane way to do that with
semaphores that I know of). Instead it always returns true. Which
disturbs the test. I guess I'll expand that

Re: [HACKERS] missing isinf declaration on solaris

2014-09-24 Thread Tom Lane

Oskari Saarenmaa  writes:
> GCC 4.9 build on Solaris 10 shows these warnings about isinf:
> float.c: In function 'is_infinite':
> float.c:178:2: warning: implicit declaration of function 'isinf' 

Ugh.

> isinf declaration is in  which is included by , 
> but it's surrounded by #if defined(_STDC_C99) || _XOPEN_SOURCE - 0 >= 
> 600 || defined(__C99FEATURES__).  A couple of quick Google searches 
> suggests that some other projects have worked around this by always 
> defining __C99FEATURES__ even if compiling in C89 mode.  __C99FEATURES__ 
> is only used by math.h and fenv.h in /usr/include.

> Should we just add -D__C99FEATURES__ to CPPFLAGS in 
> src/template/solaris, add our own declaration of isinf() or do something 
> else about the warning?

I'm worried that __C99FEATURES__ might do other, not-so-C89-compatible
things in later Solaris releases.  Possibly that risk could be addressed
by having src/template/solaris make an OS version check before adding the
switch, but it'd be a bit painful probably.

Based on the #if you show, I'd be more inclined to think about defining
_XOPEN_SOURCE to get the result.  There is precedent for that in
src/template/hpux which does
CPPFLAGS="$CPPFLAGS -D_XOPEN_SOURCE_EXTENDED"
I forget what-all _XOPEN_SOURCE_EXTENDED enables exactly, but if memory
serves there were a nontrivial number of now-considered-standard features
turned on by that in ancient HPUX releases.  If you want to pursue this
route, you'd need to do a bit of digging to see what else _XOPEN_SOURCE
controls in Solaris and if there is some front-end feature macro (like
_XOPEN_SOURCE_EXTENDED) that you're supposed to set instead of touching
it directly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] better atomics - v0.6

2014-09-24 Thread Heikki Linnakangas


On 09/23/2014 12:01 AM, Andres Freund wrote:

Hi,

I've finally managed to incorporate (all the?) feedback I got for
0.5. Imo the current version looks pretty good.


Thanks! I agree it looks good. Some random comments after a quick 
read-through:


There are some spurious whitespace changes in spin.c.


+ * These files can provide the full set of atomics or can do pretty much
+ * nothing if all the compilers commonly used on these platforms provide
+ * useable generics. It will often make sense to define memory barrier
+ * semantics in those, since e.g. the compiler intrinsics for x86 memory
+ * barriers can't know we don't need read/write barriers anything more than a
+ * compiler barrier.


I didn't understand the latter sentence.


+ * Don't add an inline assembly of the actual atomic operations if all the
+ * common implementations of your platform provide intrinsics. Intrinsics are
+ * much easier to understand and potentially support more architectures.


What about uncommon implementations, then? I think we need to provide 
inline assembly if any supported implementation on the platform does not 
provide intrinsics, otherwise we fall back to emulation. Or is that 
exactly what you're envisioning we do?


I believe such a situation hasn't come up on the currently supported 
platforms, so we don't necessary have to have a solution for that, but 
the comment doesn't feel quite right either.



+#define CHECK_POINTER_ALIGNMENT(ptr, bndr) \
+   Assert(TYPEALIGN((uintptr_t)(ptr), bndr))


Would be better to call this AssertAlignment, to make it clear that this 
is an assertion, not something like CHECK_FOR_INTERRUPTS(). And perhaps 
move this to c.h where other assertions are - this seems generally useful.



+ * Spinloop delay -


Spurious dash.


+/*
+ * pg_fetch_add_until_u32 - saturated addition to variable
+ *
+ * Returns the the value of ptr after the arithmetic operation.
+ *
+ * Full barrier semantics.
+ */
+STATIC_IF_INLINE uint32
+pg_atomic_fetch_add_until_u32(volatile pg_atomic_uint32 *ptr, int32 add_,
+ uint32 until)
+{
+   CHECK_POINTER_ALIGNMENT(ptr, 4);
+   return pg_atomic_fetch_add_until_u32_impl(ptr, add_, until);
+}
+


This was a surprise to me, I don't recall discussion of an 
"fetch-add-until" operation, and hadn't actually ever heard of it 
before. None of the subsequent patches seem to use it either. Can we 
just leave this out?


s/the the/the/


+#ifndef PG_HAS_ATOMIC_WRITE_U64
+#define PG_HAS_ATOMIC_WRITE_U64
+static inline void
+pg_atomic_write_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 val)
+{
+   /*
+* 64 bit writes aren't safe on all platforms. In the generic
+* implementation implement them as an atomic exchange. That might 
store a
+* 0, but only if the prev. value also was a 0.
+*/
+   pg_atomic_exchange_u64_impl(ptr, val);
+}
+#endif


Why is 0 special?


+* fail or suceed, but always return the old value


s/suceed/succeed/. Add a full stop to end.


+   /*
+* Can't run the test under the semaphore emulation, it doesn't handle
+* checking edge cases well.
+*/
+#ifndef PG_HAVE_ATOMIC_FLAG_SIMULATION
+   test_atomic_flag();
+#endif


Huh? Is the semaphore emulation broken, then?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Rajeev rastogi


On 24 September 2014 17:15, Michael Paquier Wrote:
 
> On Wed, Sep 24, 2014 at 7:59 PM, Abhijit Menon-Sen 
> wrote:
> > I can never remember that pg_controldata takes only a dirname rather
> > than "-D dirname", and I gather I'm not the only one. Here's a tiny
> > patch to make it work either way. As far as I can see, it doesn't
> > break anything, not even if you have a data directory named "-D".
> +1. I forget it all the time as well...

+1, always I get confused once pg_controldata does not work with -D.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-24 Thread Tom Lane

Heikki Linnakangas  writes:
> On 09/24/2014 08:16 AM, Tom Lane wrote:
>> Heikki's patch would eat up the high-order JEntry bits, but the other
>> points remain.

> If we don't need to be backwards-compatible with the 9.4beta on-disk 
> format, we don't necessarily need to eat the high-order JEntry bit. You 
> can just assume that that every nth element is stored as an offset, and 
> the rest as lengths. Although it would be nice to have the flag for it 
> explicitly.

If we go with this approach, I think that we *should* eat the high bit
for it.  The main reason I want to do that is that it avoids having to
engrave the value of N on stone tablets.  I think that we should use
a pretty large value of N --- maybe 32 or so --- and having the freedom
to change it later based on experience seems like a good thing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Michael Paquier

On Wed, Sep 24, 2014 at 7:59 PM, Abhijit Menon-Sen  wrote:
> I can never remember that pg_controldata takes only a dirname rather
> than "-D dirname", and I gather I'm not the only one. Here's a tiny
> patch to make it work either way. As far as I can see, it doesn't
> break anything, not even if you have a data directory named "-D".
+1. I forget it all the time as well...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-09-24 Thread Abhijit Menon-Sen

Hi Andres, Robert.

I've attached four patches here.

1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
   earlier in StartupXLOG.

2. Inside that function, issue fsync()s for the main forks we create by
   copying the _init fork.

3. A small fixup to add a const to "typedef char *FileName", because the
   earlier patch gave me warnings about discarding const-ness. This is
   consistent with many other functions in fd.c that take const char *.

4. Issue an fsync() on the data directory at startup if we need to
   perform crash recovery.

I created some unlogged relations, performed an immediate shutdown, and
then straced postgres as it performed crash recovery. The changes in (2)
do indeed fsync the files we create by copying *_init, and don't fsync
anything else (at least not after I fixed a bug ;-).

I did not do anything about the END_OF_RECOVERY checkpoint setting the
ControlFile->state to DB_SHUTDOWNED, because it wasn't clear to me if
there was any agreement on what to do. I would be happy to submit a
followup patch if we reach some decision about it.

Is this what you had in mind?

-- Abhijit
>From d8726c06cdf11674661eac1d091cf7edd05c2a0c Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Wed, 24 Sep 2014 14:43:18 +0530
Subject: Call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier

We need to call this after recovery, but not after the END_OF_RECOVERY
checkpoint is written. If we crash after that checkpoint, for example
because of ENOSPC in PreallocXlogFiles, the checkpoint has already set
the ControlFile->state to DB_SHUTDOWNED, so we don't enter recovery
again at startup.

Because we did call ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP)
in the first cleanup, this leaves the database with a bunch of _init
forks for unlogged relations, but no main forks, leading to scary
errors.

See thread from 20140912112246.ga4...@alap3.anarazel.de for details.
---
 src/backend/access/transam/xlog.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 46eef5f..218f7fb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6863,6 +6863,14 @@ StartupXLOG(void)
 	ShutdownWalRcv();
 
 	/*
+	 * Reset initial contents of unlogged relations.  This has to be done
+	 * AFTER recovery is complete so that any unlogged relations created
+	 * during recovery also get picked up.
+	 */
+	if (InRecovery)
+		ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
+
+	/*
 	 * We don't need the latch anymore. It's not strictly necessary to disown
 	 * it, but let's do it for the sake of tidiness.
 	 */
@@ -7130,14 +7138,6 @@ StartupXLOG(void)
 	PreallocXlogFiles(EndOfLog);
 
 	/*
-	 * Reset initial contents of unlogged relations.  This has to be done
-	 * AFTER recovery is complete so that any unlogged relations created
-	 * during recovery also get picked up.
-	 */
-	if (InRecovery)
-		ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
-
-	/*
 	 * Okay, we're officially UP.
 	 */
 	InRecovery = false;
-- 
1.9.1

>From 7eba57b5ed9e1eda1fa1b14b32a82828617d823e Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Wed, 24 Sep 2014 16:01:37 +0530
Subject: =?UTF-8?q?Make=20ResetUnloggedRelations(=E2=80=A6=5FINIT)=20fsync?=
 =?UTF-8?q?=20newly-created=20main=20forks?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During UNLOGGED_RELATION_INIT, after we have copied ${x}_init to $x, we
issue fsync()s for the newly-created files. We depend on their existence
and a checkpoint isn't going to fsync them for us during recovery.

See thread from 20140912112246.ga4...@alap3.anarazel.de for details.
---
 src/backend/storage/file/reinit.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 3229f41..4febf6f 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -339,6 +339,42 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
 		}
 
 		FreeDir(dbspace_dir);
+
+		/*
+		 * copy_file() above has already called pg_flush_data() on the
+		 * files it created. Now we need to fsync those files, because
+		 * a checkpoint won't do it for us while we're in recovery.
+		 */
+
+		dbspace_dir = AllocateDir(dbspacedirname);
+		while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+		{
+			ForkNumber forkNum;
+			int			oidchars;
+			char		oidbuf[OIDCHARS + 1];
+			char		mainpath[MAXPGPATH];
+
+			/* Skip anything that doesn't look like a relation data file. */
+			if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+	 &forkNum))
+continue;
+
+			/* Also skip it unless this is the init fork. */
+			if (forkNum != INIT_FORKNUM)
+continue;
+
+			/* Construct main fork pathname. */
+			memcpy(oidbuf, de->d_name, oidchars);
+			oidbuf[oidchars] = '\

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Abhijit Menon-Sen

At 2014-09-24 14:03:41 +0300, hlinnakan...@vmware.com wrote:
>
> Ah, I frequently run into that too; but with pg_resetxlog.

Well, that's no fun. Patch attached.

-- Abhijit
>From 23fc4d90d0353e1c6d65ca5715fc0338199f01cf Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Wed, 24 Sep 2014 16:48:36 +0530
Subject: Make pg_resetxlog also accept -D datadir

---
 src/bin/pg_resetxlog/pg_resetxlog.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 302d005..3b43aff 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -89,7 +89,7 @@ main(int argc, char *argv[])
 	MultiXactId set_oldestmxid = 0;
 	char	   *endptr;
 	char	   *endptr2;
-	char	   *DataDir;
+	char	   *DataDir = NULL;
 	int			fd;
 
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_resetxlog"));
@@ -111,10 +111,14 @@ main(int argc, char *argv[])
 	}
 
 
-	while ((c = getopt(argc, argv, "fl:m:no:O:x:e:")) != -1)
+	while ((c = getopt(argc, argv, "D:fl:m:no:O:x:e:")) != -1)
 	{
 		switch (c)
 		{
+			case 'D':
+DataDir = optarg;
+break;
+
 			case 'f':
 force = true;
 break;
@@ -233,7 +237,7 @@ main(int argc, char *argv[])
 		}
 	}
 
-	if (optind == argc)
+	if (DataDir == NULL && optind == argc)
 	{
 		fprintf(stderr, _("%s: no data directory specified\n"), progname);
 		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
@@ -257,7 +261,8 @@ main(int argc, char *argv[])
 	}
 #endif
 
-	DataDir = argv[optind];
+	if (DataDir == NULL)
+		DataDir = argv[optind];
 
 	if (chdir(DataDir) < 0)
 	{
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 01:59 PM, Abhijit Menon-Sen wrote:

Hi.

I can never remember that pg_controldata takes only a dirname rather
than "-D dirname", and I gather I'm not the only one. Here's a tiny
patch to make it work either way. As far as I can see, it doesn't
break anything, not even if you have a data directory named "-D".


Ah, I frequently run into that too; but with pg_resetxlog.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Magnus Hagander

On Wed, Sep 24, 2014 at 12:59 PM, Abhijit Menon-Sen  
wrote:
> Hi.
>
> I can never remember that pg_controldata takes only a dirname rather
> than "-D dirname", and I gather I'm not the only one. Here's a tiny
> patch to make it work either way. As far as I can see, it doesn't
> break anything, not even if you have a data directory named "-D".

I haven't looked at the code, but definitely +1 for the feature.
That's really quite annoying.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump bug in 9.4beta2 and HEAD

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 01:50 PM, Thom Brown wrote:

On 15 August 2014 16:31, Pavel Stehule  wrote:


2014-08-14 9:03 GMT+02:00 Heikki Linnakangas :


On 08/14/2014 06:53 AM, Joachim Wieland wrote:


I'm seeing an assertion failure with "pg_dump -c --if-exists" which is
not ready to handle BLOBs it seems:

pg_dump: pg_backup_archiver.c:472: RestoreArchive: Assertion `mark !=
((void *)0)' failed.

To reproduce:

$ createdb test
$ pg_dump -c --if-exists test  (works, dumps empty database)
$ psql test -c "select lo_create(1);"
$ pg_dump -c --if-exists test  (fails, with the above mentioned
assertion)



The code tries to inject an "IF EXISTS" into the already-construct DROP
command, but it doesn't work for large objects, because the deletion
command looks like "SELECT pg_catalog.lo_unlink(xxx)". There is no DROP
there.

I believe we could use "SELECT pg_catalog.lo_unlink(loid) FROM
pg_catalog.pg_largeobject_metadata WHERE loid = xxx".
pg_largeobject_metadata table didn't exist before version 9.0, but we don't
guarantee pg_dump's output to be compatible in that direction anyway, so I
think that's fine.

The quick fix would be to add an exception for blobs, close to where
Assert is. There are a few exceptions there already. A cleaner solution
would be to add a new argument to ArchiveEntry and make the callers
responsible for providing an "DROP IF EXISTS" query, but that's not too
appetizing because for most callers it would be boring boilerplate code.
Perhaps add an argument, but if it's NULL, ArchiveEntry would form the
if-exists query automatically from the DROP query.


I am sending two patches

first is fast fix

second fix is implementation of Heikki' idea.


I'm guessing this issue is still unresolved?  It would be nice to get this
off the open items list.


Yeah, I had completely forgotten about this. Alvaro, could you finish 
this off?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] make pg_controldata accept "-D dirname"

2014-09-24 Thread Abhijit Menon-Sen

Hi.

I can never remember that pg_controldata takes only a dirname rather
than "-D dirname", and I gather I'm not the only one. Here's a tiny
patch to make it work either way. As far as I can see, it doesn't
break anything, not even if you have a data directory named "-D".

-- Abhijit
>From 15d43a3a47b3042d3365cf86d4bf585ed00fa744 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Wed, 24 Sep 2014 16:26:00 +0530
Subject: Make pg_controldata ignore a -D before DataDir

---
 src/bin/pg_controldata/pg_controldata.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index f815024..7e8d0c2 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -120,7 +120,11 @@ main(int argc, char *argv[])
 	}
 
 	if (argc > 1)
+	{
 		DataDir = argv[1];
+		if (strcmp(DataDir, "-D") == 0 && argc > 2)
+			DataDir = argv[2];
+	}
 	else
 		DataDir = getenv("PGDATA");
 	if (DataDir == NULL)
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump bug in 9.4beta2 and HEAD

2014-09-24 Thread Thom Brown

On 15 August 2014 16:31, Pavel Stehule  wrote:

> Hi
>
>
> 2014-08-14 9:03 GMT+02:00 Heikki Linnakangas :
>
>> On 08/14/2014 06:53 AM, Joachim Wieland wrote:
>>
>>> I'm seeing an assertion failure with "pg_dump -c --if-exists" which is
>>> not ready to handle BLOBs it seems:
>>>
>>> pg_dump: pg_backup_archiver.c:472: RestoreArchive: Assertion `mark !=
>>> ((void *)0)' failed.
>>>
>>> To reproduce:
>>>
>>> $ createdb test
>>> $ pg_dump -c --if-exists test  (works, dumps empty database)
>>> $ psql test -c "select lo_create(1);"
>>> $ pg_dump -c --if-exists test  (fails, with the above mentioned
>>> assertion)
>>>
>>
>> The code tries to inject an "IF EXISTS" into the already-construct DROP
>> command, but it doesn't work for large objects, because the deletion
>> command looks like "SELECT pg_catalog.lo_unlink(xxx)". There is no DROP
>> there.
>>
>> I believe we could use "SELECT pg_catalog.lo_unlink(loid) FROM
>> pg_catalog.pg_largeobject_metadata WHERE loid = xxx".
>> pg_largeobject_metadata table didn't exist before version 9.0, but we don't
>> guarantee pg_dump's output to be compatible in that direction anyway, so I
>> think that's fine.
>>
>> The quick fix would be to add an exception for blobs, close to where
>> Assert is. There are a few exceptions there already. A cleaner solution
>> would be to add a new argument to ArchiveEntry and make the callers
>> responsible for providing an "DROP IF EXISTS" query, but that's not too
>> appetizing because for most callers it would be boring boilerplate code.
>> Perhaps add an argument, but if it's NULL, ArchiveEntry would form the
>> if-exists query automatically from the DROP query.
>>
>
> I am sending two patches
>
> first is fast fix
>
> second fix is implementation of Heikki' idea.
>

I'm guessing this issue is still unresolved?  It would be nice to get this
off the open items list.

Thom

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 08:16 AM, Tom Lane wrote:

Jan Wieck  writes:

On 09/15/2014 09:46 PM, Craig Ringer wrote:

Anyway - this is looking like the change will go in, and with it a
catversion bump. Introduction of a jsonb version/flags byte might be
worthwhile at the same time. It seems likely that there'll be more room
for improvement in jsonb, possibly even down to using different formats
for different data.

Is it worth paying a byte per value to save on possible upgrade pain?



If there indeed has to be a catversion bump in the process of this, then
I agree with Craig.


FWIW, I don't really.  To begin with, it wouldn't be a byte per value,
it'd be four bytes, because we need word-alignment of the jsonb contents
so there's noplace to squeeze in an ID byte for free.  Secondly, as I
wrote in <15378.1408548...@sss.pgh.pa.us>:

: There remains the
: question of whether to take this opportunity to add a version ID to the
: binary format.  I'm not as excited about that idea as I originally was;
: having now studied the code more carefully, I think that any expansion
: would likely happen by adding more type codes and/or commandeering the
: currently-unused high-order bit of JEntrys.  We don't need a version ID
: in the header for that.  Moreover, if we did have such an ID, it would be
: notationally painful to get it to most of the places that might need it.

Heikki's patch would eat up the high-order JEntry bits, but the other
points remain.


If we don't need to be backwards-compatible with the 9.4beta on-disk 
format, we don't necessarily need to eat the high-order JEntry bit. You 
can just assume that that every nth element is stored as an offset, and 
the rest as lengths. Although it would be nice to have the flag for it 
explicitly.


There are also a few free bits in the JsonbContainer header that can be 
used as a version ID in the future. So I don't think we need to change 
the format to add an explicit version ID field.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 10:45 AM, Fabien COELHO wrote:

Currently these distributions are achieved by mapping a continuous
function onto integers, so that neighboring integers get neighboring
number of draws, say with size=7:

#draws 10 6 3 1 0 0 0  // some exponential distribution
int drawn   0 1 2 3 4 5 6

Although having an exponential distribution of accesses on tuples is quite
reasonable, the likelyhood there would be so much correlation between
neighboring values is not realistic at all. You need some additional
shuffling to get there.


I don't understand what that pseudo-random stage you're talking about is. Can
you elaborate?


The pseudo random stage is just a way to scatter the values. A basic
approach to achieve this is "i' = (i * large-prime) % size", if you have a
modulo. For instance with prime=5 you may get something like:

#draws 10 6 3 1 0 0 0
int drawn   0 1 2 3 4 5 6 (i)
scattered   0 5 3 1 6 4 2 (i' = 5 i % 7)

So the distribution becomes:

#draws 10 1 0 3 0 6 0
scattered   0 1 2 3 4 5 6

Which is more interesting from a testing perspective because it removes
the neighboring value correlation.


Depends on what you're testing. Yeah, shuffling like that makes sense 
for a primary key. Or not: very often, recently inserted rows are also 
queried more often, so that there is indeed a strong correlation between 
the integer key and the access frequency. Or imagine that you have a 
table that stores the height of people in centimeters. To populate that, 
you would want to use a gaussian distributed variable, without shuffling.


For shuffling, perhaps we should provide a pgbench function or operator 
that does that directly, instead of having to implement it using * and 
%. Something like hash(x, min, max), where x is the input variable 
(gaussian distributed, or whatever you want), and min and max are the 
range to map it to.



I must say that I'm appaled by a decision process which leads to such
results, with significant patches passed, and the tiny complement to make
it really useful (I mean not on the paper or on the feature list, but in
real life) is rejected...


The idea of a modulo operator was not rejected, we'd just like to have 
the infrastructure in place first.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extending COPY TO

2014-09-24 Thread Heikki Linnakangas


On 09/24/2014 09:23 AM, Andrea Riciputi wrote:

Imagine you access PG from an application written in the language X
using a driver library, both your application and your PG instance
run on two different hosts.


In that scenario, you'll be using the PQgetCopyData function to get the 
data. PQgetCopyData returns one row at a time; the application can 
trivially change the line-ending to whatever it wants, when writing the 
output to a file or wherever it goes.



As I wrote before, despite being an heavy PG user, it’s my first time
on the hackers ML and I don’t want to seem disrespectful of the
community.


No worries; thanks for effort, even if this idea doesn't pan out.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-09-24 Thread Dilip kumar

On 24 August 2014 11:33, Amit Kapila Wrote

Thanks for you comments, i have worked on both the review comment lists, sent 
on 19 August, and 24 August.

Latest patch is attached with the mail..


on 19 August:

>You can compare against SQLSTATE by using below API.
>val = PQresultErrorField(res, PG_DIAG_SQLSTATE);

>You need to handle *42P01* SQLSTATE, also please refer below
>usage where we are checking SQLSTATE.

>fe-connect.c
>PQresultErrorField(conn->result, PG_DIAG_SQLSTATE),
>   ERRCODE_INVALID_PASSWORD) == 0)

DONE

>Few more comments:
>
>1.
>* If user has not given the vacuum of complete db, then if
>
>I think here you have said reverse of what code is doing.
>You don't need *not* in above sentence.

DONE


>2.
>+   appendPQExpBuffer(&sql, 
>"\"%s\".\"%s\"", nspace, relName);
>I think here you need to use function fmtQualifiedId() or fmtId()
>or something similar to handle quotes appropriately.

DONE

>3.
>
>+   */
>+   if (!r && !completedb)
>Here the usage of completedb variable is reversed which means
>that it goes into error path when actually whole database is
>getting vacuumed and the reason is that you are setting it
>to false in below code:
>+   /* Vaccuming full database*/
>+   vacuum_tables = false;

FIXED

>4.
>Functions prepare_command() and vacuum_one_database() contain
>duplicate code, is there any problem in using prepare_command()
>function in vacuum_one_database(). Another point in this context
>is that I think it is better to name function prepare_command()
>as append_vacuum_options() or something on that lines, also it will
>be better if you can write function header for this function as well.

DONE

>5.
>+   if (error)
>+   {
>+   for (i = 0; i < max_slot; i++)
>+   {
>+   DisconnectDatabase(&connSlot[i]);
>+   }
>
>Here why do we need DisconnectDatabase() type of function?
>Why can't we simply call PQfinish() as in base code?

Beacause base code jut need to handle main connection, and when sending the 
PQfinish, means either its completed or error,
In both the cases, control is with client, But in multi connection case, if one 
connection fails then we need to send
cancle to on other connection that wwhat is done DisconnectDatabase.

>
>6.
>+   /*
>+   * if table list is not provided then we need to do vaccum for 
>whole DB
>+   * get the list of all tables and prpare the list
>+   */
>spelling of prepare is wrong. I have noticed spell mistake
>in comments at some other place as well, please check all
>comments once

FIXED

>
>7. I think in new mechanism cancel handler will not work.
>In single connection vacuum it was always set/reset
>in function executeMaintenanceCommand(). You might need
>to set/reset it in function run_parallel_vacuum().

Good catch, Now i have called SetCancelConn(pSlot[0].connection), on first 
connection. this will enable cancle
handler to cancle the query on first connection so that select loop will come 
out.


24 August
-
>
>1. I could see one shortcoming in the way the patch has currently parallelize 
>the
>   work for --analyze-in-stages. Basically patch is performing the work for 
> each stage
>   for multiple tables in concurrent connections that seems okay for the cases 
> when
>   number of parallel connections is less than equal to number of tables, but 
> for
>   the case when user has asked for more number of connections than number of 
> tables,
>   then I think this strategy will not be able to use the extra connections.

I think --analyze-in-stages should maintain the order.

>2. Similarly for the case of multiple databases, currently it will not be able
>   to use connections more than number of tables in each database because the
>   parallelizing strategy is to just use the conncurrent connections for
>   tables inside single database.

I think for multiple database doing the parallel execution we need to maintain 
the multiple connection with multiple databases.
And we need to maintain a table list for all the databases together to run them 
concurrently. I think this may impact the startup cost,
as we need to create a multiple connection and disconnect for preparing the 
list and i think it will increase the complexity also.


>I am not completely sure whether current strategy is good enough or
>we should try to address the above problems.  What do you think?

>3.
>+   do
>+   {
>+   i = select_loop(maxFd, &slotset);
>+   Assert(i != 0);
>
>Could you explain the reason of using this loop, I think you
>want to wait for data on socket descriptor, but why for maxFd?
>Also it is better if you explain this logic in comments.

We are sending vacuum job to

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2014-09-24 Thread Andres Freund

Hi,

On 2014-09-24 14:26:37 +0530, Abhijit Menon-Sen wrote:
> Thanks for your comments, and I'm sorry it's taken me so long to
> respond.
> 
> At 2014-08-03 11:18:57 +0530, amit.kapil...@gmail.com wrote:
> >
> > After looking at code, I also felt that it is better to add this as a
> > version of pg_stattuple.
> 
> I started off trying to do that, but now I'm afraid I strongly disagree.
> First, pg_stattuple works on relations and indexes, whereas fastbloat
> only works on relations (because of the VM/FSM use). Second, the code
> began to look hideous when mushed together, with too many ifs to avoid
> locking I didn't need and so on. The logic is just too different.

Why not add it to the stattuple extension, but as an independent
function (and file)? I don't really see a need for a completely new
extension, but a separate extension seems wasteful.

> > 7.
> > HeapTupleSatisfiesVacuumNoHint()
> > a. Why can't we use HeapTupleSatisfiesVisibility() with dirty snapshot
> > as we use for pgstattuple?

I think it's completely unacceptable to copy a visibility routine. And I
don't believe for a second that avoiding hint bit setting makes it legal
to not acquire a content lock for this. What's your reasoning for that
being safe? If you argue that all possible corruption has limited impact
you need to do that *much* more explicitly and verbosely.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2014-09-24 Thread Abhijit Menon-Sen

Hi Amit.

Thanks for your comments, and I'm sorry it's taken me so long to
respond.

At 2014-08-03 11:18:57 +0530, amit.kapil...@gmail.com wrote:
>
> After looking at code, I also felt that it is better to add this as a
> version of pg_stattuple.

I started off trying to do that, but now I'm afraid I strongly disagree.
First, pg_stattuple works on relations and indexes, whereas fastbloat
only works on relations (because of the VM/FSM use). Second, the code
began to look hideous when mushed together, with too many ifs to avoid
locking I didn't need and so on. The logic is just too different.

> Is this assumption based on the reason that if the visibility map bit
> of page is set, then there is high chance that vacuum would have
> pruned the dead tuples and updated FSM with freespace?

Right. I'm not convinced an explanation of the VM/FSM belongs in the
fastbloat documentation, but I'll see if I can make it clearer.

> 1. compilation errors […]
> I think this is not a proper multi-line comment. […]
> It is better to have CHECK_FOR_INTERRUPTS() in above loop. […]
> Incase of new page don't you need to account for freespace. […]
> 5. Don't we need the handling for empty page (PageIsEmpty()) case?

Thanks, have fixed, will push updates soon.

> What is the reason for not counting ItemIdIsDead() case for
> accounting of dead tuples.

Will reconsider and fix.

> 7.
> HeapTupleSatisfiesVacuumNoHint()
> a. Why can't we use HeapTupleSatisfiesVisibility() with dirty snapshot
> as we use for pgstattuple?

Heavier locking. I tried to make do with the existing HTS* functions,
but fastbloat was noticeably faster in tests with the current setup.

> b. If we want to use HeapTupleSatisfiesVacuum(), why can't we add one
> parameter to function which can avoid setting of hintbit.

This is certainly a possibility, and as you say it would be better in
terms of maintenance. I didn't do it that way because I wanted to keep
the extension self-contained rather than make it depend on core changes.
If there's consensus on adding a nohint parameter, sure, I can do that.
(But the fastbloat won't work on current versions, which would suck.)

In the meantime, I'll merge the later updates from HTSVacuum into my
code. Thanks for the heads-up.

> function header of vac_estimate_reltuples() suggests that it is
> used by VACUUM and ANALYZE, I think it will be better to update
> the comment w.r.t new usage as well.

OK.

> 10. I think it should generate resource file as is done for other
> modules if you want to keep it as a separate module.

Thanks, will do.

> Do you really need to specify examples in README.

I don't *need* to, but I like it.

-- Abhijit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgcrypto: PGP signatures

2014-09-24 Thread Abhijit Menon-Sen

At 2014-09-15 13:37:48 +0200, ma...@joh.to wrote:
>
> I'm not sure we're talking about the same thing.

No, we weren't. I was under the impression that the signatures
could be validated. Sorry for the noise.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-24 Thread Fabien COELHO



Hello Heikki,


If you reject it, you can also remove the gaussian and exponential random
distribution which is near useless without a mean to add a minimal
pseudo-random stage, for which "(x * something) % size" is a reasonable
approximation, hence the modulo submission.


I'm confused. The gaussian and exponential patch was already committed.


Yes.

They are significant patches that really involved significant work, and 
which are only really useful with a complementary stupid 10 lines patch 
which is being rejected without understanding why it is needed.


Are you saying that it's not actually useful, and should be reverted? 
That doesn't make sense to me, gaussian and exponential distributed 
values seems quite useful to me in test scripts.


Yes and no.

Currently these distributions are achieved by mapping a continuous 
function onto integers, so that neighboring integers get neighboring 
number of draws, say with size=7:


  #draws 10 6 3 1 0 0 0  // some exponential distribution
  int drawn   0 1 2 3 4 5 6

Although having an exponential distribution of accesses on tuples is quite 
reasonable, the likelyhood there would be so much correlation between 
neighboring values is not realistic at all. You need some additional 
shuffling to get there.


I don't understand what that pseudo-random stage you're talking about is. Can 
you elaborate?


The pseudo random stage is just a way to scatter the values. A basic 
approach to achieve this is "i' = (i * large-prime) % size", if you have a 
modulo. For instance with prime=5 you may get something like:


  #draws 10 6 3 1 0 0 0
  int drawn   0 1 2 3 4 5 6 (i)
  scattered   0 5 3 1 6 4 2 (i' = 5 i % 7)

So the distribution becomes:

  #draws 10 1 0 3 0 6 0
  scattered   0 1 2 3 4 5 6

Which is more interesting from a testing perspective because it removes 
the neighboring value correlation.


A better approach is to use a hash function. "i' = hash(i) % size",
although it skews the distribution some more, but the quality of the 
shuffling is much better than with the mult-modulo version above.

Note that you need a modulo as well...

I must say that I'm appaled by a decision process which leads to such 
results, with significant patches passed, and the tiny complement to make 
it really useful (I mean not on the paper or on the feature list, but in 
real life) is rejected...


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-09-24 Thread Heikki Linnakangas


On 09/23/2014 10:04 PM, Alvaro Herrera wrote:

+  
+   The BRIN implementation in 
PostgreSQL
+   is primarily maintained by Álvaro Herrera.
+  


We don't usually have such verbiage in the docs. The GIN and GiST pages 
do, but I think those are a historic exceptions, not something we want 
to do going forward.



+   
+
+ BrinOpcInfo *opcInfo(void)
+ 
+  
+   Returns internal information about the indexed columns' summary data.
+  
+ 
+


I think you should explain what that internal information is. The 
minmax-19a.patch adds the type OID argument to this; remember to update 
the docs.


In SP-GiST, the similar function is called "config". It might be good to 
use the same name here, for consistency across indexams (although I 
actually like the "opcInfo" name better than "config")


The docs for the other support functions need to be updated, now that 
you changed the arguments from DeformedBrTuple to BrinValues.



+ 
+   To implement these methods in a generic ways, normally the opclass
+   defines its own internal support functions.  For instance, minmax
+   opclasses add the support functions for the four inequality operators
+   for the datatype.
+   Additionally, the operator class must supply appropriate
+   operator entries,
+   to enable the optimizer to use the index when those operators are
+   used in queries.


The above needs improvement ;-)


+BRIN indexes (a shorthand for Block Range indexes)
+store summaries about the values stored in consecutive table physical 
block ranges.


"consecutive table physical block ranges" is quite a mouthful.


+For datatypes that have a linear sort order, the indexed data
+corresponds to the minimum and maximum values of the
+values in the column for each block range,
+which support indexed queries using these operators:
+
+
+ <
+ <=
+ =
+ >=
+ >
+


That's the built-in minmax indexing strategy, yes, but you could have 
others, even for datatypes with a linear sort order.



+ To find out the index tuple for a particular page range, we have an internal


s/find out/find/


+ new heap tuple contains null values but the index tuple indicate there are no


s/indicate/indicates/


+ Open questions
+ --
+
+ * Same-size page ranges?
+   Current related literature seems to consider that each "index entry" in a
+   BRIN index must cover the same number of pages.  There doesn't seem to be a


What is the related literature? Is there an academic paper or something 
that should be cited as a reference for BRIN?



+  * TODO
+  ** ScalarArrayOpExpr (amsearcharray -> SK_SEARCHARRAY)
+  ** add support for unlogged indexes
+  ** ditto expressional indexes


We don't have unlogged indexes in general, so no need to list that here. 
What would be needed to implement ScalarArrayOpExprs?


I didn't realize that expression indexes are still not supported. And I 
see that partial indexes are not supported either. Why not? I wouldn't 
expect BRIN to need to care about those things in particular; the 
expressions for an expressional or partial index are handled in the 
executor, no?



+ /*
+  * A tuple in the heap is being inserted.  To keep a brin index up to date,
+  * we need to obtain the relevant index tuple, compare its stored values with
+  * those of the new tuple; if the tuple values are consistent with the summary
+  * tuple, there's nothing to do; otherwise we need to update the index.


s/compare/and compare/. Perhaps replace one of the semicolons with a 
full stop.



+  * If the range is not currently summarized (i.e. the revmap returns 
InvalidTid
+  * for it), there's nothing to do either.
+  */
+ Datum
+ brininsert(PG_FUNCTION_ARGS)


There is no InvalidTid, as a constant or a #define. Perhaps replace with 
"invalid item pointer".



+   /*
+* XXX We need to know the size of the table so that we know how long to
+* iterate on the revmap.  There's room for improvement here, in that we
+* could have the revmap tell us when to stop iterating.
+*/


The revmap doesn't know how large the table is. Remember that you have 
to return all blocks that are not in the revmap, so you can't just stop 
when you reach the end of the revmap. I think the current design is fine.


I have to stop now to do some other stuff. Overall, this is in pretty 
good shape. In addition to little cleanup of things I listed above, and 
similar stuff elsewhere that I didn't read through right now, there are 
a few medium-sized items I'd still like to see addressed before you 
commit this:


* expressional/partial index support
* the difficulty of testing the union support function that we discussed 
earlier
* clarify the memory context stuff of support functions that we also 
discussed earlier


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postg

91 matches

Mail list logo