[HACKERS] CREATE FUNCTION .. LEAKPROOF docs

2015-06-14 Thread Dean Rasheed
I think the docs for the LEAKPROOF option in create_function.sgml
ought to mention RLS as well as security barrier views. Also the
current text is no longer strictly correct in light of commit
dcbf5948e12aa60b4d6ab65b6445897dfc971e01. Suggested rewording
attached.

Regards,
Dean
diff --git a/doc/src/sgml/ref/create_function.sgml b/doc/src/sgml/ref/create_function.sgml
new file mode 100644
index c5beb16..cc2098c
--- a/doc/src/sgml/ref/create_function.sgml
+++ b/doc/src/sgml/ref/create_function.sgml
@@ -350,9 +350,18 @@ CREATE [ OR REPLACE ] FUNCTION
effects.  It reveals no information about its arguments other than by
its return value.  For example, a function which throws an error message
for some argument values but not others, or which includes the argument
-   values in any error message, is not leakproof.   The query planner may
-   push leakproof functions (but not others) into views created with the
-   literalsecurity_barrier/literal option.  See
+   values in any error message, is not leakproof.  This affects how the
+   system executes queries against views created with the
+   literalsecurity_barrier/literal option or tables with row level
+   security enabled.  The system will enforce conditions from security
+   policies and security barrier views before any user-supplied conditions
+   from the query itself that contain non-leakproof functions, in order to
+   prevent the inadvertent exposure of data.  Functions and operators
+   marked as leakproof are assumed to be trustworthy, and may be executed
+   before conditions from security policies and security barrier views.
+   In addtion, functions which do not take arguments or which are not
+   passed any arguments from the security barrier view or table do not have
+   to be marked as leakproof to be executed before security conditions.  See
xref linkend=sql-createview and xref linkend=rules-privileges.
This option can only be set by the superuser.
   /para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-06-14 Thread Dean Rasheed
On 11 June 2015 at 05:15, Bruce Momjian br...@momjian.us wrote:
 I have committed the first draft of the 9.5 release notes.  You can view
 the output here:

 http://momjian.us/pgsql_docs/release-9-5.html


I think it's worth mentioning
dcbf5948e12aa60b4d6ab65b6445897dfc971e01, probably under General
Performance. It's an optimisation, and also a user-visible change to
the way LEAKPROOF works. Perhaps something like

Allow pushdown of non-leakproof functions into security barrier views
if the function is not passed any arguments from the view.

Previously only functions marked as LEAKPROOF could be pushed down
into security barrier views.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] s_lock() seems too aggressive for machines with many sockets

2015-06-14 Thread Jan Wieck
The whole thing turns out to be based on wrong baseline data, taken with 
a pgbench client running from a remote machine. It all started out from 
an investigation against 9.3. Curiously enough, the s_lock() problem 
that existed in 9.3 has a very similar effect on throughput as a network 
bottleneck has on 9.5.


Sorry for the noise, Jan


On 06/10/2015 09:18 AM, Jan Wieck wrote:

Hi,

I think I may have found one of the problems, PostgreSQL has on machines
with many NUMA nodes. I am not yet sure what exactly happens on the NUMA
bus, but there seems to be a tipping point at which the spinlock
concurrency wreaks havoc and the performance of the database collapses.

On a machine with 8 sockets, 64 cores, Hyperthreaded 128 threads total,
a pgbench -S peaks with 50-60 clients around 85,000 TPS. The throughput
then takes a very sharp dive and reaches around 20,000 TPS at 120
clients. It never recovers from there.

The attached patch demonstrates that less aggressive spinning and (much)
more often delaying improves the performance on this type of machine.
The 8 socket machine in question scales to over 350,000 TPS.

The patch is meant to demonstrate this effect only. It has a negative
performance impact on smaller machines and client counts  #cores, so
the real solution will probably look much different. But I thought it
would be good to share this and start the discussion about reevaluating
the spinlock code before PGCon.


Regards, Jan




--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-06-14 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Sat, Jun 13, 2015 at 11:30:30PM -0400, Bruce Momjian wrote:
   Note that I guess MauMau is a nickname.
  
  I think we are fine consistenly putting Japanese last names first _if_
  we always capitalize the last name, e.g. FUJITA Etsuro --- is that a
  good idea? Does everyone like that?  Does any other country want that?
 
 OK, new idea.  What about, instead of having the last name be all-caps,
 we have the last name start with an uppercase letter, then use smallcaps
 for the rest of the last name:
 
   https://en.wikipedia.org/wiki/Small_caps
 
 That way, the last name will not appear too large, but will be clear as
 something different from other names.  Peter, I assume small-caps is
 possible.

I realize this might come across as overkill, but, well, we actually
have this database system for anyone who has an account on
postgresql.org, the wiki, been to one of the conferences that uses the
PG auth system, etc.

Perhaps we just need to add a display as field or similar to that
system?  Or have a check-box for each individual to indicate if they'd
prefer to use 'First Last' or 'LAST First'.  We already track first and
last names as different fields.  We also have a unique username field
and we might be able to come up with a system along the lines of
'user:username' which is then replaced by whatever is in the
display as field in the database.  My thought would be for that to be
a one-time thing which is then reviewed and then the release notes are
published as they are today.  I don't think we'd want to dynamically
change the release notes based on this on the actual webpage, for
various reasons.

My 2c on the whole thing is that it should be up to the individual and
not based on country, as I can imagine individuals who woluld be annoyed
if their name started showing up in the release notes as
'FROST Stephen' simply because they moved to Japan (no, I don't have any
intentions of doing so, just saying).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Collection of memory leaks for ECPG driver

2015-06-14 Thread Michael Paquier
On Sat, Jun 13, 2015 at 6:25 PM, Michael Meskes mes...@postgresql.org wrote:
 On Sat, Jun 13, 2015 at 12:02:40AM -0400, Tom Lane wrote:
 But having said that, I would not be in a hurry to remove any existing
 if-guards for the case.  For one thing, it makes the code look more
 similar to backend code that uses palloc/pfree, where we do *not* allow
 pfree(NULL).  There's also the point that changing longstanding code
 creates back-patching hazards, so unless there's a clear gain it's best
 not to.

 Makes sense, but there is no point in adding hos if-guards to old code that
 doesn't have it either,right?

In any case, whatever the final decision done here, I just wanted to
point out that there is still a leak in connect.c. Per se the attached
patch, that does not check for a NULL pointer before ecpg_free because
other code paths in the routine patched don't do so. So you get
something locally consistent ;)
-- 
Michael
diff --git a/src/interfaces/ecpg/ecpglib/connect.c b/src/interfaces/ecpg/ecpglib/connect.c
index 55c5680..e45d17f 100644
--- a/src/interfaces/ecpg/ecpglib/connect.c
+++ b/src/interfaces/ecpg/ecpglib/connect.c
@@ -321,7 +321,10 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
 	}
 
 	if ((this = (struct connection *) ecpg_alloc(sizeof(struct connection), lineno)) == NULL)
+	{
+		ecpg_free(dbname);
 		return false;
+	}
 
 	if (dbname != NULL)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-06-14 Thread Bruce Momjian
On Sat, Jun 13, 2015 at 11:30:30PM -0400, Bruce Momjian wrote:
  Note that I guess MauMau is a nickname.
 
 I think we are fine consistenly putting Japanese last names first _if_
 we always capitalize the last name, e.g. FUJITA Etsuro --- is that a
 good idea? Does everyone like that?  Does any other country want that?

OK, new idea.  What about, instead of having the last name be all-caps,
we have the last name start with an uppercase letter, then use smallcaps
for the rest of the last name:

https://en.wikipedia.org/wiki/Small_caps

That way, the last name will not appear too large, but will be clear as
something different from other names.  Peter, I assume small-caps is
possible.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git push hook to check for outdated timestamps

2015-06-14 Thread Magnus Hagander
On Fri, Jun 12, 2015 at 4:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Andrew Dunstan and...@dunslane.net writes:
  On 06/12/2015 09:31 AM, Robert Haas wrote:
  Could we update our git hook to refuse a push of a new commit whose
  timestamp is more than, say, 24 hours in the past?  Our commit history
  has some timestamps in it now that are over a month off, and it's
  really easy to do, because when you rebase a commit, it keeps the old
  timestamp.  If you then merge or cherry-pick that into an official
  branch rather than patch + commit, you end up with this problem unless
  you are careful to fix it by hand.  It would be nice to prevent
  further mistakes of this sort, as they create confusion.

  I think 24 hours is probably fairly generous,

 Yeah ... if we're going to try to enforce a linear-looking history, ISTM
 the requirement ought to be newer than the latest commit on the same
 branch.  Perhaps that would be unduly hard to implement though.


From a quick look at our existing script, I think that's doable, but I'd
have to do some more detailed verification before I'm sure. And we'd have
to figure out some way to deal with a push with multiple commits in it, but
it should certainly be doable if the first one is.

Would we in that case want to enforce linearity *and* recent-ness, or just
linearity? as in, do we care about the commit time even if it doesn't
change the order?



FWIW, our git_changelog script tries to avoid this problem by paying
 attention to CommitDate not Date.  But I agree that it's confusing when
 those fields are far apart.


That brings it back to the enforcing - would we want to enforce both those?

(And to confuse it even more, Date gets renamed to AuthorDate when you do a
full log.. But AFAIK it's the same date, they just change the name of it)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] 9.5 release notes

2015-06-14 Thread Petr Jelinek

On 2015-06-14 03:02, Bruce Momjian wrote:

On Sat, Jun 13, 2015 at 05:47:59AM -0300, Alvaro Herrera wrote:

Petr Jelinek wrote:

Hi,

+  listitem
+   para
+Add typeJSONB/ functions functionjsonb_set()/ and
+functionjsonb_pretty/ (Dmitry Dolgov, Andrew Dunstan)
+   /para
+  /listitem

I believe I should be 3rd author for this one as I rewrote large parts of
this functionality as part of the review.


Also, it looks like we could spell your last name with an accent,
assuming the i-acute character in Latin1 is fine.


Ah, you are correct.  I found a few commits that did have that accent.
All _seven_ of Petr's 9.5 commit mentions updated to add the accent.
:-)  Thanks.



Yes i-accute is correct, thanks. I don't really mind my name being 
without the accent either though :)


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Entities created in one query not available in another in extended protocol

2015-06-14 Thread Shay Rojansky
Tom, I agree this is entirely a client-side issue. Regardless, as Simon
says it would be useful to have some documentation for client implementors.

Sehrope, thanks for the JDBC link! I was actually thinking of going about
it another way in Npgsql:

   1. Send messages normally until the first Execute message is sent.
   2. From that point on, socket writes should simply be non-blocking. As
   long as buffers aren't full, there's no issue, we continue writing. The
   moment a non-blocking write exits because it would block, we transfer
   control to the user, who can now read data from queries (the ADO.NET.API
   allows for multiple resultsets).
   3. When the user finishes processing the resultsets, control is
   transferred back to Npgsql which continues sending messages (back to step
   1).

This approach has the advantage of not caring about buffer sizes or trying
to assume how many bytes are sent by the server: we simply write as much as
we can without blocking, then switch to reading until we've exhausted
outstanding data, and back to writing. The main issue I'm concerned about
is SSL/TLS, which is a layer on top of the sockets and which might not work
well with non-blocking sockets...

Any comments?

Shay

On Sat, Jun 13, 2015 at 5:08 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 12 June 2015 at 20:06, Tom Lane t...@sss.pgh.pa.us wrote:

 Simon Riggs si...@2ndquadrant.com writes:
  On 11 June 2015 at 22:12, Shay Rojansky r...@roji.org wrote:
  Just in case it's interesting to you... The reason we implemented
 things
  this way is in order to avoid a deadlock situation - if we send two
 queries
  as P1/D1/B1/E1/P2/D2/B2/E2, and the first query has a large resultset,
  PostgreSQL may block writing the resultset, since Npgsql isn't reading
 it
  at that point. Npgsql on its part may get stuck writing the second
 query
  (if it's big enough) since PostgreSQL isn't reading on its end (thanks
 to
  Emil Lenngren for pointing this out originally).

  That part does sound like a problem that we have no good answer to.
 Sounds
  worth starting a new thread on that.

 I do not accept that the backend needs to deal with that; it's the
 responsibility of the client side to manage buffering properly if it is
 trying to overlap sending the next query with receipt of data from a
 previous one.  See commit 2a3f6e368 for a related issue in libpq.


 Then it's our responsibility to define what manage buffering properly
 means and document it.

 People should be able to talk to us without risk of deadlock.

 --
 Simon Riggshttp://www.2ndQuadrant.com/
 http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services



[HACKERS] Tab completion for TABLESAMPLE

2015-06-14 Thread Petr Jelinek

Hi,

looks like I omitted psql tab completion from the TABLESAMPLE patch. The 
attached patch adds it.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index b9f5acc..5801a06 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -737,6 +737,11 @@ static const SchemaQuery Query_for_list_of_matviews = {
FROM pg_catalog.pg_event_trigger \
   WHERE substring(pg_catalog.quote_ident(evtname),1,%d)='%s'
 
+#define Query_for_list_of_tablesample_methods \
+ SELECT pg_catalog.quote_ident(tsmname) \
+   FROM pg_catalog.pg_tablesample_method \
+  WHERE substring(pg_catalog.quote_ident(tsmname),1,%d)='%s'
+
 /*
  * This is a list of all things in Pgsql, which can show up after CREATE or
  * DROP; and there is also a query to get a list of them.
@@ -3578,6 +3583,10 @@ psql_completion(const char *text, int start, int end)
 			 prev2_wd[0] == '\0')
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_relations, NULL);
 
+/* TABLESAMPLE */
+	else if (pg_strcasecmp(prev_wd, TABLESAMPLE) == 0)
+		COMPLETE_WITH_QUERY(Query_for_list_of_tablesample_methods);
+
 /* TRUNCATE */
 	else if (pg_strcasecmp(prev_wd, TRUNCATE) == 0)
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with CREATE TABLE ... (LIKE ... INCLUDING INDEXES)

2015-06-14 Thread Thom Brown
On 14 June 2015 at 04:25, Michael Paquier michael.paqu...@gmail.com wrote:
 On Sun, Jun 14, 2015 at 11:38 AM, Thom Brown t...@linux.com wrote:
 As you can see, 3 indexes are missing, which happen to be ones that
 would duplicate the column definition of another index.  Is this
 intentional?  If so, shouldn't it be documented behaviour?

 Looking at the code (transformIndexConstraints in parse_utilcmd.c),
 this is intentional behavior:
 /*
  * Scan the index list and remove any redundant index
 specifications. This
  * can happen if, for instance, the user writes UNIQUE PRIMARY KEY. A
  * strict reading of SQL would suggest raising an error
 instead, but that
  * strikes me as too anal-retentive. - tgl 2001-02-14
  *
  * XXX in ALTER TABLE case, it'd be nice to look for duplicate
  * pre-existing indexes, too.
  */
 Per this commit:
 commit: c7d2ce7bc6eb02eac0c10fae9caf2936a71ad25c
 author: Tom Lane t...@sss.pgh.pa.us
 date: Wed, 14 Feb 2001 23:32:38 +
 Repair problems with duplicate index names generated when CREATE TABLE
 specifies redundant UNIQUE conditions.

 Perhaps a mention in the docs in the page of CREATE TABLE would be
 welcome. Something like Redundant index definitions are ignored with
 INCLUDING INDEXES.

 Thoughts?

The commit refers to duplicate index names, and only for UNIQUE
indexes.  This behaviour is beyond that.  And how does it determine
which index to copy?  In my example, I placed an index in a different
tablespace.  That could be on a drive with very different read/write
characteristics than the default tablespace (seek latency/sequential
read rate/write speed etc.) and possibly with different GUC
parameters, but there's no way for us to determine if this is the
case, so Postgres can easily remove the more performant one.

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Andres Freund
On 2015-06-11 20:03:16 -0300, Alvaro Herrera wrote:
 Rewriter
 
 Parsing occurs as currently.  During query rewrite, specifically at the
 bottom of the per-relation loop in fireRIRrules(), we will modify the
 query tree: each relation RTE containing a colstore will be replaced
 with a JoinExpr containing the relation as left child and the colstore
 as right child (1).  The colstore RTE will be of a new RTEKind.  For
 each such change, all Var nodes that point to attnums stored in the
 colstore will modified so that they reference the RTE of the colstore
 instead (2).

FWIW, I think this is a pretty bad place to tackle this. For one I think
we shouldn't add more stuff using the rewriter unless it's clearly the
best interface. For another, doing things in the rewriter will make
optimizing things much harder - the planner will have to reconstruct
knowledge which of the joins are column store joins and such.

Why do you want to do things there?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On 2015-06-11 20:03:16 -0300, Alvaro Herrera wrote:
 Parsing occurs as currently.  During query rewrite, specifically at the
 bottom of the per-relation loop in fireRIRrules(), we will modify the
 query tree: each relation RTE containing a colstore will be replaced
 with a JoinExpr containing the relation as left child and the colstore
 as right child (1).  The colstore RTE will be of a new RTEKind.  For
 each such change, all Var nodes that point to attnums stored in the
 colstore will modified so that they reference the RTE of the colstore
 instead (2).

 FWIW, I think this is a pretty bad place to tackle this. For one I think
 we shouldn't add more stuff using the rewriter unless it's clearly the
 best interface. For another, doing things in the rewriter will make
 optimizing things much harder - the planner will have to reconstruct
 knowledge which of the joins are column store joins and such.

As a comparison point, one of my Salesforce colleagues just put in
a somewhat similar though single-purpose thing, to expand what originally
is a simple table reference into a join (against a system catalog that's
nowhere mentioned in the original query).  In our case, we wanted to force
a scan on a large table to have a constraint on the leading primary key
column; if the query has such a constraint already, then fine, else create
one by joining to a catalog that lists the allowed values of that column.
We started out by trying to do it in the rewriter, and that didn't work
well at all.  We ended up actually doing it at createplan.c time, which
is conceptually ugly, but there was no good place to do it earlier without
duplicating a lot of indexqual analysis.  But the thing that made that
painful was that the transformation was optional, and indeed might happen
or not happen for a given query depending on the selected plan shape.
AFAICT the transformation Alvaro is proposing is unconditional, so
it might be all right to do it in the rewriter.  As you say, if the
planner needs to reconstruct what happened, that would be a strike
against this way, but it's not clear from here whether any additional
info is needed beyond the already-suggested extra RTEKind.

Another model that could be followed is expansion of inheritance-tree
references, which happens early in the planner.  In that case the
planner does keep additional information about what it did (the appendrel
data structures), so that could be a good model if this code needs to
do likewise.

The existing join-alias-var flattening logic in the planner might be of
interest as well for the variable-substitution business, which I suspect
is the main reason Alvaro is proposing doing it in the rewriter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Won't this cause issues to MergeAppend optimizations?

 Like what?

 Well, as I understand, MergeAppend needs to know the sort order of the
 child node, right?  But that's available only on the relation RTE, not
 on the colstore-join RTE.

Uh, what?  Sort order is a property of a path, not an RTE.  And we
have always understood which join types preserve sort order.

 And if there are such issues, why do you think you wouldn't be
 expected to solve them?

 Precisely.  If I simply reject having column stores in partitioned
 tables, then I don't *need* to solve them.

You misunderstood the thrust of my comment, which basically is that
I doubt anyone will think that rejecting that combination is an
acceptable implementation restriction.  It might be all right if it
doesn't work very well in v0, but not if the implementation is designed
so that it can never be fixed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with CREATE TABLE ... (LIKE ... INCLUDING INDEXES)

2015-06-14 Thread Tom Lane
Thom Brown t...@linux.com writes:
 The commit refers to duplicate index names, and only for UNIQUE
 indexes.  This behaviour is beyond that.  And how does it determine
 which index to copy?  In my example, I placed an index in a different
 tablespace.  That could be on a drive with very different read/write
 characteristics than the default tablespace (seek latency/sequential
 read rate/write speed etc.) and possibly with different GUC
 parameters, but there's no way for us to determine if this is the
 case, so Postgres can easily remove the more performant one.

TBH, I have no particular concern for this argument.  If you created
duplicate indexes you did a dumb thing anyway; you should not be expecting
that the system's response to that situation will be remarkably
intelligent.  As the comment indicates, the code in question is really
only meant to deal with a specific kind of redundancy we'd observed in
real-world CREATE TABLE commands.  It's probably accidental that it gets
applied in CREATE TABLE LIKE cases, but it doesn't bother me that it is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_resetsysid

2015-06-14 Thread Petr Jelinek

Hi,

This is an attempt to revive the resetsysid tool submission that didn't 
go anywhere last year [1].


In that thread there were mainly two complains, one that pg_resetxlog 
would preferably be tool for more advanced stuff and similar utilities 
that are not as dangerous should have separate binaries using some of 
the functionality of pg_resetxlog as library and the other was with the 
string to uint64 parsing which is apparently not so easy to do portably. 
About the second one there was also separate discussion between Andres 
and Tom [2].


Here is series of patches that implements both the original feature and 
solves the issues mentioned above.


0001 - This patch splits the code of pg_resetxlog to common.c/h api and 
pg_resetxlog.c itself which does the program logic using the api. The 
only logical change in the XLogSegNoOffsetToRecPtr call is moved into 
main() function, which I think is ok as the main() does all the other 
ControlFile changes. All other changes should be just cosmetic in order 
to not rely on global static variables.


0002 - Adds pg_resetsysid utility which changes the system id to newly 
generated one.


0003 - Adds -s option to pg_resetxlog to change the system id to the one 
specified - this is separate from the other one as it can be potentially 
more dangerous. It also adds some defines that try to portably map 
pg_strtoint64 and pg_strtouint64 to the underlying implementations 
(strtol/strtoll/_strtoi64 and unsigned variants of those). It will leave 
the pg_strtoint64/pg_strtouint64 undefined if the platform does not 
support the functionality. Tom was unsure if there are such platforms on 
our supported list anymore.


Which is why I also added 0004, that adds fall-back implementation of 
the two functions based on very slightly modified code from OpenBSD. 
This is here mostly in case buildfarm gets red after applying 0003 or we 
have reports from users that things got broken.


[1] http://www.postgresql.org/message-id/539b97fc.8040...@2ndquadrant.com
[2] 
http://www.postgresql.org/message-id/20140603144654.gm24...@awork2.anarazel.de


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From f587fd47c9e43f5c08cf840978ff21b64f39d24d Mon Sep 17 00:00:00 2001
From: Petr Jelinek pjmo...@pjmodos.net
Date: Sun, 14 Jun 2015 16:23:39 +0200
Subject: [PATCH 4/4] Add fallback implementations of pg_strto(u)int64.

---
 src/include/c.h  |   6 ++
 src/port/Makefile|   2 +-
 src/port/pgstrtoint64.c  | 169 +++
 src/port/pgstrtouint64.c | 129 
 4 files changed, 305 insertions(+), 1 deletion(-)
 create mode 100644 src/port/pgstrtoint64.c
 create mode 100644 src/port/pgstrtouint64.c

diff --git a/src/include/c.h b/src/include/c.h
index 537b4d0..a01b455 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -1102,6 +1102,9 @@ extern int	fdatasync(int fildes);
 #define pg_strtoint64 strtoll
 #elif defined(WIN32)
 #define pg_strtoint64 _strtoi64
+#else
+extern int64 pg_strtoint64(const char *nptr, char **endptr, int base);
+#define PGSTRTOINT64_IMPL 1
 #endif
 
 /* Define portable pg_strtouint64() */
@@ -,6 +1114,9 @@ extern int	fdatasync(int fildes);
 #define pg_strtouint64 strtoull
 #elif defined(WIN32)
 #define pg_strtouint64 _strtoui64
+#else
+extern uint64 pg_strtouint64(const char *nptr, char **endptr, int base);
+#define PGSTRTOUINT64_IMPL 1
 #endif
 
 /*
diff --git a/src/port/Makefile b/src/port/Makefile
index bc9b63a..91e4a2b 100644
--- a/src/port/Makefile
+++ b/src/port/Makefile
@@ -32,7 +32,7 @@ LIBS += $(PTHREAD_LIBS)
 
 OBJS = $(LIBOBJS) $(PG_CRC32C_OBJS) chklocale.o erand48.o inet_net_ntop.o \
 	noblock.o path.o pgcheckdir.o pgmkdirp.o pgsleep.o \
-	pgstrcasecmp.o pqsignal.o \
+	pgstrcasecmp.o pgstrtoint64.o pgstrtouint64.o pqsignal.o \
 	qsort.o qsort_arg.o quotes.o sprompt.o tar.o thread.o
 
 # foo_srv.o and foo.o are both built from foo.c, but only foo.o has -DFRONTEND
diff --git a/src/port/pgstrtoint64.c b/src/port/pgstrtoint64.c
new file mode 100644
index 000..e968df0
--- /dev/null
+++ b/src/port/pgstrtoint64.c
@@ -0,0 +1,169 @@
+/*-
+ *
+ * pgstrtoint64.c
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/port/pgstrtoint64.c
+ *
+ * Based on OpenBSD strtoll() implementation.
+ * Differences from vanilla implementaion:
+ *  - renamed to pg_strtoint64
+ *  - returns int64 instead of long long
+ *  - internal params defined as int64 instead of long long
+ *  - uses PG_INT64_MIN/PG_INT64_MAX
+ *
+ * The OpenBSD copyright terms follow.
+ */
+
+/* $OpenBSD: strtoll.c,v 1.7 2013/03/28 18:09:38 martynas Exp $ */
+/*-
+ * Copyright (c) 1992 The Regents of the University of California.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or 

Re: [HACKERS] On columnar storage

2015-06-14 Thread Tomas Vondra

Hi,

On 06/13/15 00:07, Michael Nolan wrote:



On Thu, Jun 11, 2015 at 7:03 PM, Alvaro Herrera
alvhe...@2ndquadrant.com mailto:alvhe...@2ndquadrant.com wrote:

We hope to have a chance to discuss this during the upcoming developer
unconference in Ottawa.  Here are some preliminary ideas to shed some
light on what we're trying to do.


I've been trying to figure out a plan to enable native column stores
(CS or colstore) for Postgres.  Motivations:

* avoid the 32 TB limit for tables
* avoid the 1600 column limit for tables
* increased performance

Are you looking to avoid all hardware-based limits, or would using a 64
bit row pointer be possible?  That would give you 2^64 or 1.8 E19 unique
rows over whatever granularity/uniqueness you use (per table, per
database, etc.)
--
Mike Nolan.


I don't think the number of tuples is the main problem here, it's the 
number of pages a single relation can have. Looking at the numbers of 
rows as a direct function of TID size is misleading, because the TID is 
split into two fixed parts - page number (32b) and tuple number (16b).


For the record, 2^48 is 281,474,976,710,656 which ought to be enough for 
anybody, but we waste large part of that because we assume there might 
be up to 2^16 tuples per page, although the actual limit is way lower 
(~290 for 8kB pages, and ~1200 for 32kB pages.


So we can only have ~4 billion pages, which is where the 32TB limit 
comes from (with 32kB pages it's 128TB).


Longer TIDs are one a straightforward way to work around this limit, 
assuming you add the bits to the 'page number' field. Adding 16 bits 
(thus using 64-bit pointers) would increase the limit 2^16-times to 
about 2048 petabytes (with 8kB pages). But that of course comes with a 
cost, because you have to keep those larger TIDs in indexes etc.


Another option might be to split the 48 bits differently, by moving 5 
bits to the page number part of TID (so that we expect ~2048 tuples per 
page at most). That'd increase the limit to 1PB (4PB with 32kB pages).


The column store approach is somehow orthogonal to this, because it 
splits the table vertically into multiple pieces, each stored in a 
separate relfilenode and thus using a separate sequence of page numbers.


And of course, the usual 'horizontal' partitioning has a very similar 
effect (separate filenodes).


regards

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Alvaro Herrera
Robert Haas wrote:
 On Thu, Jun 11, 2015 at 7:03 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  I've been trying to figure out a plan to enable native column stores
  (CS or colstore) for Postgres.  Motivations:
 
  * avoid the 32 TB limit for tables
  * avoid the 1600 column limit for tables
  * increased performance
 
 To me, it feels like there are two different features here that would
 be better separated.  First, there's the idea of having a table that
 gets auto-joined to other tables whenever you access it, so that the
 user sees one really wide table but really the data is segregated by
 column groups under the hood.  That's a neat idea.  

Thanks.  (It also seems pretty tricky to implement.)

 Second, there's the idea of a way of storing tuples that is different
 from PostgreSQL's usual mechanism - i.e. a storage manager API. I
 understand your concerns about going through the FDW API so maybe
 that's not the right way to do it, but it seems to me that in the end
 you are going to end up with something awfully similar to that + a
 local relfilenode + WAL logging support.  I'm not clear on why you
 want to make the column store API totally different from the FDW API;
 there may be a reason, but I don't immediately see it.

I just don't see that the FDW API is such a good fit for what I'm trying
to do.  Anything using the FDW API needs to implement its own visibility
checking, for instance.  I want to avoid that, because it's its own
complex problem.  Also, it doesn't look like the FDW API supports things
that I want to do (neither my proposed LateColumnMaterialization nor my
proposed BitmapColumnScan).  I would have to extend the FDW API, then
contort my stuff so that it fits in the existing API; then I will need
to make sure that existing FDWs are not broken by the changes I would
propose elsewhere.  Round peg, square hole is all I see here.  All in
all, this seems too much additional work, just to make to things that
are really addressing different problems go through the same code.

You're correct about local WAL logging.  We will need a solution to
that problem.  I was hoping to defer that until we had something like
Alexander Korotkov's proposed pluggable WAL stuff.


 Each of these two features is independently useful.  If you just had
 the first feature, you could use the existing table format as your
 columnar store.  I'm sure it's possible to do a lot better in some
 cases, but there could easily be cases where that's a really big win,
 because the existing table format has far more sophisticated indexing
 capabilities than any columnar store is likely to have in an early
 version.

Yeah, sure, it's pretty likely that the first experimental colstore
implementation will just be based on existing infrastructure.

 The second capability, of course, opens up all sorts of
 interesting possibilities, like compressed read-only storage or
 index-organized tables.  And it would also let you have an
 all-columnar table, similar to what Citus's cstore_fdw does, which
 doesn't seem like it would be supported in your design, and could be a
 pretty big win for certain kinds of tables.

Well, I would like to know about those use cases that Citus stuff is
good at, so that we can make sure they work reasonably under my proposed
design.  Maybe I have to require vacuuming so that the all-visible bits
are set, so that a column scan can skip visibility checking for most of
the underlying heap tuples to produce a large aggregation report.  That
seems pretty reasonable to me.

 BTW, I'm not sure if it's a good idea to call all of this stuff
 cstore.  The name is intuitive, but cstore_fdw has enough traction
 already that it may create some confusion.

I'm not thinking of calling anything user-visible with the name
cstore.  It's just a development term.  Surely we're not reserving
names from what is used in third party code.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-06-14 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 OK, new idea.  What about, instead of having the last name be all-caps,
 we have the last name start with an uppercase letter, then use smallcaps
 for the rest of the last name:
   https://en.wikipedia.org/wiki/Small_caps
 That way, the last name will not appear too large, but will be clear as
 something different from other names.  Peter, I assume small-caps is
 possible.

FWIW, I vote strongly against having any contributor names in caps in the
release notes.  It would be visually distracting, and it would make the
name look like the most important thing in the entry, while in point of
fact it's the *least* important.  (Maybe not to the contributor, but
certainly to anybody else.)

For pretty much the same reason, I'm not in favor of small caps either.
Even assuming we can do that consistently (which I bet we can't; we
do not have all that much control over how web browsers render HTML),
it would be calling attention to itself, which is exactly not the result
I think we should be after.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Tomas Vondra


On 06/14/15 17:22, Alvaro Herrera wrote:

Robert Haas wrote:

,,,

Second, there's the idea of a way of storing tuples that is different
from PostgreSQL's usual mechanism - i.e. a storage manager API. I
understand your concerns about going through the FDW API so maybe
that's not the right way to do it, but it seems to me that in the end
you are going to end up with something awfully similar to that + a
local relfilenode + WAL logging support.  I'm not clear on why you
want to make the column store API totally different from the FDW API;
there may be a reason, but I don't immediately see it.


I just don't see that the FDW API is such a good fit for what I'm trying
to do.  Anything using the FDW API needs to implement its own visibility
checking, for instance.  I want to avoid that, because it's its own
complex problem.  Also, it doesn't look like the FDW API supports things
that I want to do (neither my proposed LateColumnMaterialization nor my
proposed BitmapColumnScan).  I would have to extend the FDW API, then
contort my stuff so that it fits in the existing API; then I will need
to make sure that existing FDWs are not broken by the changes I would
propose elsewhere.  Round peg, square hole is all I see here.  All in
all, this seems too much additional work, just to make to things that
are really addressing different problems go through the same code.

You're correct about local WAL logging.  We will need a solution to
that problem.  I was hoping to defer that until we had something like
Alexander Korotkov's proposed pluggable WAL stuff.


Probably worth mentioning we don't expect the column store API to be 
used only by extensions, but even by code from within the core which can 
use the current WAL infrastructure etc.


--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Alvaro Herrera
Andres Freund wrote:
 On 2015-06-11 20:03:16 -0300, Alvaro Herrera wrote:
  Rewriter
  
  Parsing occurs as currently.  During query rewrite, specifically at the
  bottom of the per-relation loop in fireRIRrules(), we will modify the
  query tree: each relation RTE containing a colstore will be replaced
  with a JoinExpr containing the relation as left child and the colstore
  as right child (1).  The colstore RTE will be of a new RTEKind.  For
  each such change, all Var nodes that point to attnums stored in the
  colstore will modified so that they reference the RTE of the colstore
  instead (2).
 
 FWIW, I think this is a pretty bad place to tackle this. For one I think
 we shouldn't add more stuff using the rewriter unless it's clearly the
 best interface. For another, doing things in the rewriter will make
 optimizing things much harder - the planner will have to reconstruct
 knowledge which of the joins are column store joins and such.

What do you mean reconstruct?  That bit will be explicit -- I'm thinking
the planner will have specialized bits to deal with such nodes, i.e. it
will know there's a one-to-one relationship between colstore tuples and
heap tuples, so it will know how to move nodes around.

Or at least, that's how I'm hoping it will be ...

 Why do you want to do things there?

Because I see no better place.  Isn't the rewriter the place where we
modify the query tree, mostly?

Another choice I considered was subquery_planner: in the spot where
expand_inherited_tables() is called, add a new call to expand the RTEs
that correspond to tables containing stores.  But I had second thoughts
because that function says it's safe because it only adds baserels, not
joins, and I'm adding joins.  I guess I could use a spot earlier than
wherever it is that joins are checked, but this planner code is
recursive anyway and for this I must do a single pass.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-06-14 Thread Bruce Momjian
On Sun, Jun 14, 2015 at 01:57:08PM -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  On Sat, Jun 13, 2015 at 05:47:59AM -0300, Alvaro Herrera wrote:
 
   Also, it looks like we could spell your last name with an accent,
   assuming the i-acute character in Latin1 is fine.
  
  Ah, you are correct.  I found a few commits that did have that accent. 
  All _seven_ of Petr's 9.5 commit mentions updated to add the accent. 
  :-)  Thanks.
 
 ... though two of them now say Petr Petr.

Thanks, fixed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Alvaro Herrera
Tom Lane wrote:
 I wrote:
  Another model that could be followed is expansion of inheritance-tree
  references, which happens early in the planner.
 
 Actually ... if you intend to allow column storage to work with inherited
 tables (and if you don't, you'd better have a darn good reason why not),
 I think you probably want to do this join insertion *after* inheritance
 expansion, so you can join child column stores only to the appropriate
 child heap table, and not to the entire inheritance tree.

Won't this cause issues to MergeAppend optimizations?

I guess we could have the join be of a new type that's transparent to
such optimizations but, you see, that seems complicated enough that I
want to avoid that ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Git humor

2015-06-14 Thread Andrew Dunstan


On 06/14/2015 02:07 PM, Bruce Momjian wrote:

If you ever had trouble understanding a git manual page, this humorous
random git man page generator is full of fun:

http://git-man-page-generator.lokaltog.net/

Try a few generate runs to find one that seems logical.  :-)




This looked good to me:
|
|

   |git-dodge-tree|dodge some local trees outside a few fscked applied tips


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] The real reason why TAP testing isn't ready for prime time

2015-06-14 Thread Tom Lane
Buildfarm member hamster has failed a pretty significant fraction of
its recent runs in the BinInstallCheck step:
http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hamsterbr=HEAD

Since other critters aren't equally distressed, it seems likely that
this is just an out-of-disk-space type of problem.  But maybe it's
trying to tell us that there's a genuine platform-specific bug there.
In any case, I challenge anyone to figure out what's happening from
the information available from the buildfarm logs.

I don't know whether this is just that the buildfarm script isn't
collecting data that it should be.  But my experiences with the
TAP test scripts haven't been very positive.  When they fail, it
takes a lot of digging to find out why.  Basically, that entire
mechanism sucks as far as debuggability is concerned.

I think there is a good argument for turning this off in the buildfarm
until there is a better way of identifying and solving problems.  It is
not helping us that hamster is red half the time for undiscoverable
reasons.  That just conditions people to ignore it, and it may well be
masking real problems that the machine could be finding if it weren't
failing at this step.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need Multixact Freezing Docs

2015-06-14 Thread Alvaro Herrera
Josh Berkus wrote:
 
  You can see the current multixact value in pg_controldata output.  Keep
  timestamped values of that somewhere (a table?) so that you can measure
  consumption rate.  I don't think we provide SQL-level access to those
  values.
 
 Bleh.  Do we provide SQL-level access in 9.4?  If not, I think that's a
 requirement before release.  Telling users to monitor a setting using a
 restricted-permission command-line utility which produces a
 version-specific text file they have to parse is not going to win us a
 lot of fans.

I found that I had written a very quick accessor function to multixact
shared state data awhile ago.  This might be useful for monitoring
purposes.  What do people think of including this for 9.5?  It needs a
small change to add the newly added oldestOffset (plus a little cleanup
and docs).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
From 2134f7928cad8c8d1a1e2d752c9443fb44a92a7e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera alvhe...@alvh.no-ip.org
Date: Sun, 14 Jun 2015 11:48:58 -0300
Subject: [PATCH] pg_get_multixact_range

---
 src/backend/access/transam/multixact.c | 31 +++
 src/include/catalog/pg_proc.h  |  2 ++
 2 files changed, 33 insertions(+)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 516a89f..dedded3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -3274,3 +3274,34 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
 
 	SRF_RETURN_DONE(funccxt);
 }
+
+#include access/htup_details.h
+
+Datum
+pg_get_multixact_range(PG_FUNCTION_ARGS);
+
+Datum
+pg_get_multixact_range(PG_FUNCTION_ARGS)
+{
+	TupleDesc		tupdesc;
+	HeapTuple		htup;
+	Datum			values[3];
+	bool			nulls[3];
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, return type must be a row type);
+	tupdesc = BlessTupleDesc(tupdesc);
+
+	LWLockAcquire(MultiXactGenLock, LW_SHARED);
+	values[0] = MultiXactState-oldestMultiXactId;
+	values[1] = MultiXactState-nextMXact;
+	values[2] = MultiXactState-nextOffset;
+	LWLockRelease(MultiXactGenLock);
+
+	nulls[0] = nulls[1] = nulls[2] = false;
+
+	htup = heap_form_tuple(tupdesc, values, nulls);
+
+	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 6b3d194..f87d3d0 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3079,6 +3079,8 @@ DATA(insert OID = 1065 (  pg_prepared_xact PGNSP PGUID 12 1 1000 0 0 f f f f t t
 DESCR(view two-phase transactions);
 DATA(insert OID = 3819 (  pg_get_multixact_members PGNSP PGUID 12 1 1000 0 0 f f f f t t v 1 0 2249 28 {28,28,25} {i,o,o} {multixid,xid,mode} _null_ _null_ pg_get_multixact_members _null_ _null_ _null_ ));
 DESCR(view members of a multixactid);
+DATA(insert OID = 3401 (  pg_get_multixact_range   PGNSP PGUID 12 10 0 0 f f f f t f s 0 0 2249  {28,28,28} {o,o,o} {oldestmulti,nextmulti,nextoffset} _null_ _null_ pg_get_multixact_range _null_ _null_ _null_ ));
+DESCR(get range of live multixacts);
 
 DATA(insert OID = 3581 ( pg_xact_commit_timestamp PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 1184 28 _null_ _null_ _null_ _null_ _null_ pg_xact_commit_timestamp _null_ _null_ _null_ ));
 DESCR(get commit timestamp of a transaction);
-- 
2.1.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Entities created in one query not available in another in extended protocol

2015-06-14 Thread Shay Rojansky
On Sun, Jun 14, 2015 at 6:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Shay Rojansky r...@roji.org writes:
  [ rely on non-blocking sockets to avoid deadlock ]

 Yeah, that's pretty much the approach libpq has taken: write (or read)
 when you can, but press on when you can't.


Good to hear.

 The main issue I'm concerned about
  is SSL/TLS, which is a layer on top of the sockets and which might not
 work
  well with non-blocking sockets...

 We have not had word of any such problem with libpq.  It's possible that
 the intersection of SSL users with non-blocking-mode users is nil, but
 I kinda doubt that.  You do need to interpret openssl's return codes
 correctly ...


I don't think there's a problem with non-blocking I/O and SSL per se, the
question is about the .NET TLS/SSL implementation Npgsql uses - so it's
really totally unrelated to PostgreSQL...

Shay


[HACKERS] Git humor

2015-06-14 Thread Bruce Momjian
If you ever had trouble understanding a git manual page, this humorous
random git man page generator is full of fun:

http://git-man-page-generator.lokaltog.net/

Try a few generate runs to find one that seems logical.  :-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Tom Lane wrote:
  Actually ... if you intend to allow column storage to work with inherited
  tables (and if you don't, you'd better have a darn good reason why not),
  I think you probably want to do this join insertion *after* inheritance
  expansion, so you can join child column stores only to the appropriate
  child heap table, and not to the entire inheritance tree.
 
  Won't this cause issues to MergeAppend optimizations?
 
 Like what?

Well, as I understand, MergeAppend needs to know the sort order of the
child node, right?  But that's available only on the relation RTE, not
on the colstore-join RTE.  Though now that I think about it, maybe I can
push that info from the relation RTE to the colstore-join RTE, since I
know the ordering will be the same.

 And if there are such issues, why do you think you wouldn't be
 expected to solve them?

Precisely.  If I simply reject having column stores in partitioned
tables, then I don't *need* to solve them.  Later in the project, when
some planner hacker decides to join, I can ask them for advice on how to
tackle them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.5 feature count

2015-06-14 Thread Bruce Momjian
I have run a script to count the number of listitem items in the
major release notes of each major version of Postgres back to 7.4:

7.4280
8.0238
8.1187
8.2230
8.3237
8.4330
9.0252
9.1213
9.2250
9.3187
9.4217
9.5176

The 9.5 number will only change a little by 9.5 final.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-06-14 Thread Alvaro Herrera
Bruce Momjian wrote:
 On Sat, Jun 13, 2015 at 05:47:59AM -0300, Alvaro Herrera wrote:

  Also, it looks like we could spell your last name with an accent,
  assuming the i-acute character in Latin1 is fine.
 
 Ah, you are correct.  I found a few commits that did have that accent. 
 All _seven_ of Petr's 9.5 commit mentions updated to add the accent. 
 :-)  Thanks.

... though two of them now say Petr Petr.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Tom Lane
I wrote:
 Another model that could be followed is expansion of inheritance-tree
 references, which happens early in the planner.

Actually ... if you intend to allow column storage to work with inherited
tables (and if you don't, you'd better have a darn good reason why not),
I think you probably want to do this join insertion *after* inheritance
expansion, so you can join child column stores only to the appropriate
child heap table, and not to the entire inheritance tree.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Another choice I considered was subquery_planner: in the spot where
 expand_inherited_tables() is called, add a new call to expand the RTEs
 that correspond to tables containing stores.  But I had second thoughts
 because that function says it's safe because it only adds baserels, not
 joins, and I'm adding joins.

I think that comment is only justifying why we don't need to redetermine
hasJoinRTEs/hasLateralRTEs/hasOuterJoins afterwards.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-06-14 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 This patch tracks memory usage (at the block level) for all memory
 contexts. Individual palloc()s aren't tracked; only the new blocks
 allocated to the memory context with malloc().
 ...
 My general answer to the performance concerns is that they aren't a
 reason to block this patch, unless someone has a suggestion about how to
 fix them. Adding one field to a struct and a few arithmetic operations
 for each malloc() or realloc() seems reasonable to me.

Maybe, but this here is a pretty weak argument:

 The current state, where HashAgg just blows up the memory, is just not
 reasonable, and we need to track the memory to fix that problem.

Meh.  HashAgg could track its memory usage without loading the entire
system with a penalty.  Moreover, this is about fourth or fifth down the
list of the implementation problems with spilling hash aggregation to
disk.  It would be good to see credible solutions for the bigger issues
before we buy into adding overhead for a mechanism with no uses in core.

 Others have also mentioned that we might want to use this mechanism to
 track memory for other operators, like Sort or HashJoin, which might be
 simpler and more accurate.

That would imply redefining the meaning of work_mem for those operations;
it would suddenly include a lot more overhead space than it used to,
causing them to spill to disk much more quickly than before, unless one
changes the work_mem setting to compensate.  Not sure that people would
like such a change.

An even bigger problem is that it would pretty much break the logic around
LACKMEM() in tuplesort.c, which supposes that spilling individual tuples
to disk is a reliable and linear way to decrease the accounted-for memory.
Individual pfree's would typically not decrease the accounted total in
this implementation.  Depending on what you have in mind for the
spill-to-disk behavior, this issue could be a fail for HashAgg as well,
which is another reason not to commit this patch in advance of seeing the
use-case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-06-14 Thread Peter Geoghegan
On Sat, Jun 13, 2015 at 6:24 PM, Bruce Momjian br...@momjian.us wrote:
 I think we should really address this. Attached patch adds a new
 release note item for it. It also adds to the documentation that
 explains why users should prefer varchar(n)/text to character(n); the
 lack of abbreviated key support now becomes a huge disadvantage for
 character(n), whereas in previous versions the disadvantages were
 fairly minor.

 In passing, I updated the existing sort item to reflect that only
 varchar(n), text, and numeric benefit from the abbreviation
 optimization (not character types more generally + numeric), and added
 a note on the effectiveness of the abbreviation optimization alone.

 This all also seems like overkill to me.  Users just don't care about
 this level of detail, and are easily confused by it.

Really? The pre-check thing wasn't too complex for Magnus to have a
couple of bullet points on it *specifically* in his high level NYC
talk on Postgres 9.5 features [1]. I don't think it's hard to
understand at all.

Also, it's simply incorrect to describe abbreviation as: Improve the
speed of sorting character and numeric fields. Character fields
presumably include character(n), and as I pointed out character(n)
lacks abbreviation support. That needs to be documented as an
additional disadvantage of character(n) vs text/varchar(n) in the
documentation where the trade-off is already discussed (i.e. not in
the release notes), because it makes a *huge* difference. And ISTM
that it should also be clear from the release notes themselves. We
seemingly always have type /type tags when types are named in
release notes.

[1] http://www.hagander.net/talks/postgresql95.pdf
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Git humor

2015-06-14 Thread Jan de Visser
On June 14, 2015 02:07:16 PM Bruce Momjian wrote:
 If you ever had trouble understanding a git manual page, this humorous
 random git man page generator is full of fun:
 
   http://git-man-page-generator.lokaltog.net/
 
 Try a few generate runs to find one that seems logical.  :-)

To my untrained eye those look an aweful lot like they're lifted from random 
pgsql-hackers threads. 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Memory Accounting v11

2015-06-14 Thread Jeff Davis
This patch tracks memory usage (at the block level) for all memory
contexts. Individual palloc()s aren't tracked; only the new blocks
allocated to the memory context with malloc().

It also adds a new function, MemoryContextMemAllocated() which can
either retrieve the total allocated for the context, or it can recurse
and sum up the allocations for all subcontexts as well.

This is intended to be used by HashAgg in an upcoming patch that will
bound the memory and spill to disk.

Previous discussion here:

http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop

Previous concerns:

* There was a slowdown reported of around 1-3% (depending on the exact
version of the patch) on an IBM power machine when doing an index
rebuild. The results were fairly noisy for me, but it seemed to hold up.
See http://www.postgresql.org/message-id/CA
+Tgmobnu7XEn1gRdXnFo37P79bF=qLt46=37ajp3cro9db...@mail.gmail.com 
* Adding a struct field to MemoryContextData may be bad for the CPU
caching behavior, and may be the cause of the above slowdown.
* Previous versions of the patch updated the parent contexts'
allocations as well, which avoided the need to recurse when querying the
total allocation. That seemed to penalize cases that didn't need to
track the allocation. We discussed trying to opt-in to this behavior,
but it seemed more awkward than helpful. Now, the patch only updates the
allocation of the current context, and querying means recursing through
the child contexts.
* There was a concern that, if MemoryContextMemAllocated needs to
recurse to the child contexts, it will be too slow for HashAgg of
array_agg, which creates a child context per group. That was solved with
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b419865a814abbca12bdd6eef6a3d5ed67f432e1

My general answer to the performance concerns is that they aren't a
reason to block this patch, unless someone has a suggestion about how to
fix them. Adding one field to a struct and a few arithmetic operations
for each malloc() or realloc() seems reasonable to me.

The current state, where HashAgg just blows up the memory, is just not
reasonable, and we need to track the memory to fix that problem. Others
have also mentioned that we might want to use this mechanism to track
memory for other operators, like Sort or HashJoin, which might be
simpler and more accurate.

Regards,
Jeff Davis

*** a/src/backend/utils/mmgr/aset.c
--- b/src/backend/utils/mmgr/aset.c
***
*** 500,505  AllocSetContextCreate(MemoryContext parent,
--- 500,508 
  	 errdetail(Failed while creating memory context \%s\.,
  			   name)));
  		}
+ 
+ 		((MemoryContext) set)-mem_allocated += blksize;
+ 
  		block-aset = set;
  		block-freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
  		block-endptr = ((char *) block) + blksize;
***
*** 590,595  AllocSetReset(MemoryContext context)
--- 593,600 
  		else
  		{
  			/* Normal case, release the block */
+ 			context-mem_allocated -= block-endptr - ((char*) block);
+ 
  #ifdef CLOBBER_FREED_MEMORY
  			wipe_mem(block, block-freeptr - ((char *) block));
  #endif
***
*** 632,637  AllocSetDelete(MemoryContext context)
--- 637,644 
  	{
  		AllocBlock	next = block-next;
  
+ 		context-mem_allocated -= block-endptr - ((char *) block);
+ 
  #ifdef CLOBBER_FREED_MEMORY
  		wipe_mem(block, block-freeptr - ((char *) block));
  #endif
***
*** 672,677  AllocSetAlloc(MemoryContext context, Size size)
--- 679,687 
  		block = (AllocBlock) malloc(blksize);
  		if (block == NULL)
  			return NULL;
+ 
+ 		context-mem_allocated += blksize;
+ 
  		block-aset = set;
  		block-freeptr = block-endptr = ((char *) block) + blksize;
  
***
*** 861,866  AllocSetAlloc(MemoryContext context, Size size)
--- 871,878 
  		if (block == NULL)
  			return NULL;
  
+ 		context-mem_allocated += blksize;
+ 
  		block-aset = set;
  		block-freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
  		block-endptr = ((char *) block) + blksize;
***
*** 964,969  AllocSetFree(MemoryContext context, void *pointer)
--- 976,984 
  			set-blocks = block-next;
  		else
  			prevblock-next = block-next;
+ 
+ 		context-mem_allocated -= block-endptr - ((char*) block);
+ 
  #ifdef CLOBBER_FREED_MEMORY
  		wipe_mem(block, block-freeptr - ((char *) block));
  #endif
***
*** 1077,1082  AllocSetRealloc(MemoryContext context, void *pointer, Size size)
--- 1092,1098 
  		AllocBlock	prevblock = NULL;
  		Size		chksize;
  		Size		blksize;
+ 		Size		oldblksize;
  
  		while (block != NULL)
  		{
***
*** 1094,1102  AllocSetRealloc(MemoryContext context, void *pointer, Size size)
--- 1110,1123 
  		/* Do the realloc */
  		chksize = MAXALIGN(size);
  		blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+ 		oldblksize = block-endptr - ((char *)block);
+ 
  		block = (AllocBlock) realloc(block, blksize);
  		if 

Re: [HACKERS] On columnar storage

2015-06-14 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Amit Kapila wrote:
  Will the column store obey snapshot model similar to current heap tuples,
  if so will it derive the transaction information from heap tuple?
 
  Yes, visibility will be tied to the heap tuple -- a value is accessed
  only when its corresponding heap row has already been determined to be
  visible.  One interesting point that raises from this is about vacuum:
  when are we able to remove a value from the store?  I have some
  not-completely-formed ideas about this.
 
 Hm.  This seems not terribly ambitious --- mightn't a column store
 extension wish to store tables *entirely* in the column store, rather
 than tying them to a perhaps-vestigial heap table?

Well, yes, it might, but that opens a huge can of worms.  What heapam
offers is not just tuple storage, but a lot of functionality on top of
that -- in particular, tuple locking and visibility.  I am certainly not
considering re-implementing any of that.  We might eventually go there,
but we will *additionally* need different implementations of those
things, and I'm pretty sure that will be painful, so I'm trying to stay
away from that.

 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Tom Lane wrote:
  I can't help thinking that this could tie in with the storage level API
  that I was waving my arms about last year.  Or maybe not --- the goals
  are substantially different --- but I think we ought to reflect on that
  rather than just doing a narrow hack for column stores used in the
  particular way you're describing here.
 
  I can't seem to remember this proposal you mention.  Care to be more
  specific?  Perhaps a link to archives is enough.
 
 I never got to the point of having a concrete proposal, but there was a
 discussion about it at last year's PGCon unconference; were you there?

No, regrettably I wasn't there.

 Anyway the idea was to try to cut a clearer line between heap storage
 and the upper levels of the system, particularly the catalog/DDL code
 that we have so much of.  Based on Salesforce's experience so far,
 it's near impossible to get rid of HeapTuple as the lingua franca for
 representing rows in the upper system levels, so we've not really tried;
 but it would be nice if the DDL code weren't so much in bed with
 heap-specific knowledge, like the wired-into-many-places assumption that
 row insert and update actions require index updates but deletions don't.

Agreed on both counts.  As far as catalog code goes, removing direct
mapping from HeapTuple to C structs would require a huge rewrite of tons
of code.  Unless we're considering rewriting small pieces of specific
catalog handling at a time, it seems unlikely that we will have columns
from system catalogs in column stores.  (It seems reasonable that as
soon as we have column stores, we can have particular catalog code to
work with columnar storage, but I don't think there's much need for that
currently.)

I agree with your second point also --- it might be good to have a layer
in between, and it seems not completely unreasonable.  It would require
touching lots of places but not hugely transforming things.  (I think
it's not in the scope of the things I'm currently after, though.)

 We're also not very happy with the general assumption that a TID is an
 adequate row identifier (as our storage engine does not have TIDs), so
 I'm a bit disappointed to see you doubling down on that restriction
 rather than trying to lift it.

Well, in the general design, there is room for different tuple
identifiers.  I'm just not implementing it for the first version.

 Now much of this pain only comes into play if one is trying to change
 the underlying storage format for system catalogs, which I gather is
 not considered in your proposal.  But if you want one format for catalogs
 and another for user tables then you have issues like how do you guarantee
 atomic commit and crash safety across multiple storage engines.  That way
 lies a mess, especially if you're trying to keep the engines at arms'
 length which is what a pluggable architecture implies.  MySQL is a
 cautionary example we should be keeping in mind while thinking about this.

Right.  I don't want a separate storage engine that needs to
reimplement transactions (as is the case in MySQL), or visibility rules.
I don't want to have a different format for tables or catalogs; both
would still be based on the current heapam API.  I simply want to extend
the API so that I can have some columns in a separate place.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Petr Jelinek

On 2015-06-14 17:36, Tomas Vondra wrote:


On 06/14/15 17:22, Alvaro Herrera wrote:

Robert Haas wrote:

,,,

Second, there's the idea of a way of storing tuples that is different
from PostgreSQL's usual mechanism - i.e. a storage manager API. I
understand your concerns about going through the FDW API so maybe
that's not the right way to do it, but it seems to me that in the end
you are going to end up with something awfully similar to that + a
local relfilenode + WAL logging support.  I'm not clear on why you
want to make the column store API totally different from the FDW API;
there may be a reason, but I don't immediately see it.


I just don't see that the FDW API is such a good fit for what I'm trying
to do.  Anything using the FDW API needs to implement its own visibility
checking, for instance.  I want to avoid that, because it's its own
complex problem.  Also, it doesn't look like the FDW API supports things
that I want to do (neither my proposed LateColumnMaterialization nor my
proposed BitmapColumnScan).  I would have to extend the FDW API, then
contort my stuff so that it fits in the existing API; then I will need
to make sure that existing FDWs are not broken by the changes I would
propose elsewhere.  Round peg, square hole is all I see here.  All in
all, this seems too much additional work, just to make to things that
are really addressing different problems go through the same code.

You're correct about local WAL logging.  We will need a solution to
that problem.  I was hoping to defer that until we had something like
Alexander Korotkov's proposed pluggable WAL stuff.


Probably worth mentioning we don't expect the column store API to be
used only by extensions, but even by code from within the core which can
use the current WAL infrastructure etc.



I think it would be even ok if the initial implementation had 
restrictions in a way that extensions can't use it. I think we want 
implementation in the core first, support for extensions can come later. 
But it's still worth it to have an API since single implementation 
probably won't fit every purpose (see indexes).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 Actually ... if you intend to allow column storage to work with inherited
 tables (and if you don't, you'd better have a darn good reason why not),
 I think you probably want to do this join insertion *after* inheritance
 expansion, so you can join child column stores only to the appropriate
 child heap table, and not to the entire inheritance tree.

 Won't this cause issues to MergeAppend optimizations?

Like what?  And if there are such issues, why do you think you wouldn't
be expected to solve them?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Tom Lane wrote:
  Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Won't this cause issues to MergeAppend optimizations?
 
  Like what?
 
  Well, as I understand, MergeAppend needs to know the sort order of the
  child node, right?  But that's available only on the relation RTE, not
  on the colstore-join RTE.
 
 Uh, what?  Sort order is a property of a path, not an RTE.

Evidently need to do more digging .. but that makes plenty of sense.

 And we have always understood which join types preserve sort order.

That's obvious now that you say it.

 You misunderstood the thrust of my comment, which basically is that
 I doubt anyone will think that rejecting that combination is an
 acceptable implementation restriction.  It might be all right if it
 doesn't work very well in v0, but not if the implementation is designed
 so that it can never be fixed.

Gotcha.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-06-14 Thread Tomas Vondra

Hi,

On 06/14/15 22:21, Tom Lane wrote:

Jeff Davis pg...@j-davis.com writes:

This patch tracks memory usage (at the block level) for all memory
contexts. Individual palloc()s aren't tracked; only the new blocks
allocated to the memory context with malloc().
...
My general answer to the performance concerns is that they aren't a
reason to block this patch, unless someone has a suggestion about how to
fix them. Adding one field to a struct and a few arithmetic operations
for each malloc() or realloc() seems reasonable to me.


Maybe, but this here is a pretty weak argument:


The current state, where HashAgg just blows up the memory, is just not
reasonable, and we need to track the memory to fix that problem.


Meh. HashAgg could track its memory usage without loading the entire
system with a penalty.


+1 to a solution like that, although I don't think that's doable without 
digging the info from memory contexts somehow.


 Moreover, this is about fourth or fifth down the

list of the implementation problems with spilling hash aggregation to
disk.  It would be good to see credible solutions for the bigger issues
before we buy into adding overhead for a mechanism with no uses in core.


I don't think so. If we don't have an acceptable solution for tracking 
memory in hashagg, there's not really much point in doing any of the 
following steps. That's why Jeff is tackling this problem first.


Also, Jeff already posted a PoC of the memory-bounded hashagg, although 
it only worked for aggregates with fixed-size state, and not that great 
for cases like array_agg where the state grows. Maybe the improvements 
to aggregate functions proposed by David Rowley last week might help in 
those cases, as that'd allow spilling those states to disk.


So while the approach proposed by Jeff may turn out not to be the right 
one, I think memory accounting needs to be solved first (even if it's 
not committed until the whole feature is ready).



Others have also mentioned that we might want to use this mechanism
to track memory for other operators, like Sort or HashJoin, which
might be simpler and more accurate.


That would imply redefining the meaning of work_mem for those
operations; it would suddenly include a lot more overhead space than
it used to, causing them to spill to disk much more quickly than
before, unless one changes the work_mem setting to compensate. Not
sure that people would like such a change.


Don't we tweak the work_mem meaning over time anyway? Maybe not to such 
extent, but for example the hashjoin 9.5 improvements certainly change 
this by packing the tuples more densely, sizing the buckets differently 
etc. It's true the changes are in the other direction (i.e. batching 
less frequently) though.


OTOH this would make the accounting more accurate (e.g. eliminating 
differences between cases with different amounts of overhead because of 
different tuple width, for example). Wouldn't that be a good thing, even 
if people need to tweak the work_mem? Isn't that kinda acceptable when 
upgrading to the next major version?



An even bigger problem is that it would pretty much break the logic
around LACKMEM() in tuplesort.c, which supposes that spilling
individual tuples to disk is a reliable and linear way to decrease
the accounted-for memory. Individual pfree's would typically not
decrease the accounted total in this implementation. Depending on
what you have in mind for the spill-to-disk behavior, this issue
could be a fail for HashAgg as well, which is another reason not to
commit this patch in advance of seeing the use-case.


That's true, but I think the plan was always to wait for the whole 
feature, and only then commit all the pieces.



--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] creating extension including dependencies

2015-06-14 Thread Petr Jelinek

Hi,

I am getting tired installing manually required extensions manually. I 
was wondering if we might want to add option to CREATE SEQUENCE that 
would allow automatic creation of the extensions required by the 
extension that is being installed by the user.


I also wrote some prototype patch that implements this.

I call it prototype mainly because the syntax (CREATE EXTENSION ... 
RECURSIVE) could be improved, I originally wanted to do something like 
INCLUDING DEPENDENCIES but that need news (unreserved) keyword and I 
don't think it's worth it, plus it's wordy.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 7d5db7faae0b30a70ceae020638fa7779810339d Mon Sep 17 00:00:00 2001
From: Petr Jelinek pjmo...@pjmodos.net
Date: Sat, 13 Jun 2015 19:49:10 +0200
Subject: [PATCH] CREATE EXTENSION RECURSIVE

---
 doc/src/sgml/ref/create_extension.sgml | 11 
 src/backend/commands/extension.c   | 61 --
 src/backend/parser/gram.y  |  4 ++
 src/test/modules/test_extensions/.gitignore|  3 ++
 src/test/modules/test_extensions/Makefile  | 22 
 .../test_extensions/expected/test_extensions.out   | 16 ++
 .../test_extensions/sql/test_extensions.sql| 11 
 .../modules/test_extensions/test_ext1--1.0.sql |  0
 src/test/modules/test_extensions/test_ext1.control |  4 ++
 .../modules/test_extensions/test_ext2--1.0.sql |  0
 src/test/modules/test_extensions/test_ext2.control |  4 ++
 .../modules/test_extensions/test_ext3--1.0.sql |  0
 src/test/modules/test_extensions/test_ext3.control |  3 ++
 .../test_extensions/test_ext_cyclic1--1.0.sql  |  0
 .../test_extensions/test_ext_cyclic1.control   |  4 ++
 .../test_extensions/test_ext_cyclic2--1.0.sql  |  0
 .../test_extensions/test_ext_cyclic2.control   |  4 ++
 17 files changed, 143 insertions(+), 4 deletions(-)
 create mode 100644 src/test/modules/test_extensions/.gitignore
 create mode 100644 src/test/modules/test_extensions/Makefile
 create mode 100644 src/test/modules/test_extensions/expected/test_extensions.out
 create mode 100644 src/test/modules/test_extensions/sql/test_extensions.sql
 create mode 100644 src/test/modules/test_extensions/test_ext1--1.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext1.control
 create mode 100644 src/test/modules/test_extensions/test_ext2--1.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext2.control
 create mode 100644 src/test/modules/test_extensions/test_ext3--1.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext3.control
 create mode 100644 src/test/modules/test_extensions/test_ext_cyclic1--1.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext_cyclic1.control
 create mode 100644 src/test/modules/test_extensions/test_ext_cyclic2--1.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext_cyclic2.control

diff --git a/doc/src/sgml/ref/create_extension.sgml b/doc/src/sgml/ref/create_extension.sgml
index a1e7e4f8..d151fd6 100644
--- a/doc/src/sgml/ref/create_extension.sgml
+++ b/doc/src/sgml/ref/create_extension.sgml
@@ -25,6 +25,7 @@ CREATE EXTENSION [ IF NOT EXISTS ] replaceable class=parameterextension_name
 [ WITH ] [ SCHEMA replaceable class=parameterschema_name/replaceable ]
  [ VERSION replaceable class=parameterversion/replaceable ]
  [ FROM replaceable class=parameterold_version/replaceable ]
+ [ RECURSIVE ]
 /synopsis
  /refsynopsisdiv
 
@@ -139,6 +140,16 @@ CREATE EXTENSION [ IF NOT EXISTS ] replaceable class=parameterextension_name
/para
   /listitem
  /varlistentry
+
+ varlistentry
+  termliteralRECURSIVE//term
+  listitem
+   para
+Try to install extension including the required dependencies
+recursively.
+   /para
+  /listitem
+ /varlistentry
   /variablelist
  /refsect1
 
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 5cc74d0..4395519 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -46,6 +46,7 @@
 #include funcapi.h
 #include mb/pg_wchar.h
 #include miscadmin.h
+#include nodes/makefuncs.h
 #include storage/fd.h
 #include tcop/utility.h
 #include utils/builtins.h
@@ -1176,10 +1177,13 @@ CreateExtension(CreateExtensionStmt *stmt)
 	DefElem*d_schema = NULL;
 	DefElem*d_new_version = NULL;
 	DefElem*d_old_version = NULL;
+	DefElem	   *d_recursive = NULL;
 	char	   *schemaName;
 	Oid			schemaOid;
 	char	   *versionName;
 	char	   *oldVersionName;
+	bool		recursive = false;
+	List	   *recursive_parents = NIL;
 	Oid			extowner = GetUserId();
 	ExtensionControlFile *pcontrol;
 	ExtensionControlFile *control;
@@ -1263,6 +1267,14 @@ CreateExtension(CreateExtensionStmt *stmt)
 		 errmsg(conflicting or redundant options)));
 			d_old_version = defel;
 		}

[HACKERS] Sharing aggregate states between different aggregate functions

2015-06-14 Thread David Rowley
Simon and I have been going over some ideas about how to make improvements
to aggregate performance by cutting down on the duplicate work that's done
when 2 aggregate functions are used where one knows how to satisfy all the
requirements of the other.

To cut a long story short, all our ideas will require some additions or
modifications to CREATE AGGREGATE and also pg_dump support.

Tom came up with a more simple idea, that gets us some of the way, without
all that pg_dump stuff.
http://www.postgresql.org/message-id/30851.1433860...@sss.pgh.pa.us

This basically allows an aggregate's state to be shared between other
aggregate functions when both aggregate's transition functions (and a few
other things) match
There's quite a number of aggregates in our standard set which will benefit
from this optimisation.

Please find attached a patch which implements this idea.

The performance improvements are as follows:

create table t1 as
select x.x::numeric from generate_series(1,100) x(x);

-- standard case.
select sum(x),avg(x) from t1;

Master:
Time: 350.303 ms
Time: 353.716 ms
Time: 349.703 ms

Patched:
Time: 227.687 ms
Time: 222.563 ms
Time: 224.691 ms

-- extreme case.
select
stddev_samp(x),stddev(x),variance(x),var_samp(x),var_pop(x),stddev_pop(x)
from t1;

Master:
Time: 1464.461 ms
Time: 1462.343 ms
Time: 1450.232 ms

Patched:
Time: 346.473 ms
Time: 348.445 ms
Time: 351.365 ms

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


sharing_agg_states_2c3d4a9_2015-06-15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to improve a few appendStringInfo* calls

2015-06-14 Thread David Rowley
On 29 May 2015 at 12:51, Peter Eisentraut pete...@gmx.net wrote:

 On 5/12/15 4:33 AM, David Rowley wrote:
  Shortly after I sent the previous patch I did a few more searches and
  also found some more things that are not quite right.
  Most of these are to use the binary append method when the length of the
  string is already known.

 For these cases it might be better to invent additional functions such
 as stringinfo_to_text() and appendStringInfoStringInfo() instead of
 repeating the pattern of referring to data and length separately.


You're probably right. It would be nicer to see some sort of wrapper
functions that cleaned these up a bit.

I really think that's something for another patch though, this patch just
intends to put things the way they're meant to be in the least invasive way
possible. What you're proposing is changing the way it's meant to work,
which will cause much more code churn.

I've attached a re-based patch.

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


appendStringInfo_fixes_fa48767_2015-06-15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-06-14 Thread Tomas Vondra

Hi,

On 06/14/15 21:43, Jeff Davis wrote:

This patch tracks memory usage (at the block level) for all memory
contexts. Individual palloc()s aren't tracked; only the new blocks
allocated to the memory context with malloc().


I see it's adding the new field as int64 - wouldn't a Size be more 
appropriate, considering that's what we use in mctx.h and aset.c?



It also adds a new function, MemoryContextMemAllocated() which can
either retrieve the total allocated for the context, or it can
recurse and sum up the allocations for all subcontexts as well.

This is intended to be used by HashAgg in an upcoming patch that will
bound the memory and spill to disk.

Previous discussion here:

http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop

Previous concerns:

* There was a slowdown reported of around 1-3% (depending on the exact
version of the patch) on an IBM power machine when doing an index
rebuild. The results were fairly noisy for me, but it seemed to hold up.
See http://www.postgresql.org/message-id/CA
+Tgmobnu7XEn1gRdXnFo37P79bF=qLt46=37ajp3cro9db...@mail.gmail.com
* Adding a struct field to MemoryContextData may be bad for the CPU
caching behavior, and may be the cause of the above slowdown.



* Previous versions of the patch updated the parent contexts'
allocations as well, which avoided the need to recurse when querying
the total allocation. That seemed to penalize cases that didn't need
to track the allocation. We discussed trying to opt-in to this
behavior, but it seemed more awkward than helpful. Now, the patch
only updates the allocation of the current context, and querying
means recursing through the child contexts.


I don't think the opt-in idea itself was awkward, it was rather about 
the particular APIs that we came up with, especially when combined with 
the 'context inheritance' thing.


I still think the opt-in approach and updating accounting for the parent 
contexts was the best one, because it (a) minimizes impact in cases that 
don't use the accounting, and (b) makes finding the current amount of 
memory cheap ...



* There was a concern that, if MemoryContextMemAllocated needs to
recurse to the child contexts, it will be too slow for HashAgg of
array_agg, which creates a child context per group. That was solved with
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b419865a814abbca12bdd6eef6a3d5ed67f432e1


I wouldn't say this was solved - we have fixed one particular example 
of such aggregate implementation, because it was causing OOM issues with 
many groups, but there may be other custom aggregates using the same 
pattern.


Granted, built-in aggregates are probably more critical than aggregates 
provided by extensions, but I wouldn't dare to mark this solved ...



My general answer to the performance concerns is that they aren't a
reason to block this patch, unless someone has a suggestion about how
to fix them. Adding one field to a struct and a few arithmetic
operations for each malloc() or realloc() seems reasonable to me.


I'm not buying this, sorry. While I agree that we should not expect the 
memory accounting to be entirely free, we should be very careful about 
the overhead especially if we're dropping the opt-in and thus imposing 
the overhead on everyone.


But performance concerns are not a reason to block this patch approach 
seems wrong. With any other patch a 3% regression would be considered a 
serious issue IMNSHO.



The current state, where HashAgg just blows up the memory, is just
not reasonable, and we need to track the memory to fix that problem.
Others have also mentioned that we might want to use this mechanism
to track memory for other operators, like Sort or HashJoin, which
might be simpler and more accurate.


Dropping the memory accounting implementations and keeping just this new 
solution would be nice, only if we agree the performance impact to be 
acceptable. We already have accounting solution for each of those 
places, so I don't think the unification alone outweighs the regression.


regards

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The real reason why TAP testing isn't ready for prime time

2015-06-14 Thread Michael Paquier
On Mon, Jun 15, 2015 at 3:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Buildfarm member hamster has failed a pretty significant fraction of
 its recent runs in the BinInstallCheck step:
 http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hamsterbr=HEAD

 Since other critters aren't equally distressed, it seems likely that
 this is just an out-of-disk-space type of problem.  But maybe it's
 trying to tell us that there's a genuine platform-specific bug there.
 In any case, I challenge anyone to figure out what's happening from
 the information available from the buildfarm logs.

 I don't know whether this is just that the buildfarm script isn't
 collecting data that it should be.  But my experiences with the
 TAP test scripts haven't been very positive.  When they fail, it
 takes a lot of digging to find out why.  Basically, that entire
 mechanism sucks as far as debuggability is concerned.

Indeed. I think that one step in the good direction would be to
replace all the calls to system and system_or_bail with a wrapper
routine that calls IPC::Run able to catch the logs and store those
logs in each test's base path. The same applies to pg_rewind tests.

 I think there is a good argument for turning this off in the buildfarm
 until there is a better way of identifying and solving problems.  It is
 not helping us that hamster is red half the time for undiscoverable
 reasons.  That just conditions people to ignore it, and it may well be
 masking real problems that the machine could be finding if it weren't
 failing at this step.

hamster is legendary slow and has a slow disc, hence it improves
chances of catching race conditions, and it is the only slow buildfarm
machine enabling the TAP tests (by comparison dangomushi has never
failed with the TAP tests) hence I would prefer thinking that the
problem is not specific to ArchLinux ARM. In this case the failure
seems to be related to the timing test servers stop and start even if
-w switch is used with pg_ctl, particularly that PGPORT is set to the
same value for all servers... Still, for the time being I don't mind
disabling them and just did so now. I will try to investigate further
on the machine itself.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage

2015-06-14 Thread Michael Nolan
On Sun, Jun 14, 2015 at 10:30 AM, Tomas Vondra tomas.von...@2ndquadrant.com
 wrote:




 Are you looking to avoid all hardware-based limits, or would using a 64
 bit row pointer be possible?  That would give you 2^64 or 1.8 E19 unique
 rows over whatever granularity/uniqueness you use (per table, per
 database, etc.)
 --
 Mike Nolan.


 I don't think the number of tuples is the main problem here, it's the
 number of pages a single relation can have. Looking at the numbers of rows
 as a direct function of TID size is misleading, because the TID is split
 into two fixed parts - page number (32b) and tuple number (16b).

 For the record, 2^48 is 281,474,976,710,656 which ought to be enough for
 anybody, but we waste large part of that because we assume there might be
 up to 2^16 tuples per page, although the actual limit is way lower (~290
 for 8kB pages, and ~1200 for 32kB pages.

 So we can only have ~4 billion pages, which is where the 32TB limit comes
 from (with 32kB pages it's 128TB).

 Longer TIDs are one a straightforward way to work around this limit,
 assuming you add the bits to the 'page number' field. Adding 16 bits (thus
 using 64-bit pointers) would increase the limit 2^16-times to about 2048
 petabytes (with 8kB pages). But that of course comes with a cost, because
 you have to keep those larger TIDs in indexes etc.

 Another option might be to split the 48 bits differently, by moving 5 bits
 to the page number part of TID (so that we expect ~2048 tuples per page at
 most). That'd increase the limit to 1PB (4PB with 32kB pages).

 The column store approach is somehow orthogonal to this, because it splits
 the table vertically into multiple pieces, each stored in a separate
 relfilenode and thus using a separate sequence of page numbers.

 And of course, the usual 'horizontal' partitioning has a very similar
 effect (separate filenodes).

 regards

 --
 Tomas Vondra   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

 Thanks for the reply. It's been a while since my last data structures
course (1971), but I do remember a few things.  I have never personally
needed more than 1500 columns in a table, but can see how some might.
Likewise, the 32TB limit hasn't affected me yet, either.  I doubt either
ever will.

Solving either or both of those seems like it may at some point require a
larger bit space for (at least some) TIDs, which is why I was wondering if
a goal here is to eliminate all (practical) limits,

It probably doesn't make sense to force all users to use that large bit
space (with the associated space and performance penalties)  If there's a
way to do this, then you are all truly wizards. (This all reminds me of how
the IP4 bit space was parcelled up into Class A, B, C and D addresses, at a
time when people thought 32 bits would last us forever.  Maybe 128 bits
actually will.)
--
Mike Nolan






[HACKERS] query execution time faster with geqo on than off: bug?

2015-06-14 Thread David Kamholz
I've encountered a query with 11 joins whose execution time (i.e., the time
not taken up by planning) is significantly faster with geqo on rather than
off. This is surprising to me and seems like it might be a bug in the
planner, so I am posting it here rather than to -performance.

The query is below, along with EXPLAIN ANALYZE results with geqo on and
off. The server version is 9.4.4. The various geqo options are all set to
the default. join_collapse_limit is set to 12 (the query is much slower
with it set to the default of 8). Let me know what other information might
be helpful in debugging this further. Thanks!

QUERY:

SELECT ex.ex, ex.lv, ex.tt, ex.td, dnsrc.ex AS trex,
((uiuq_score(array_agg(ap.ui), array_agg(ap.uq)) +
uiuq_score(array_agg(apsrc.ui), array_agg(apsrc.uq))) / 2) AS trq
FROM ex
INNER JOIN lv ON (lv.lv = ex.lv)
INNER JOIN dn ON (dn.ex = ex.ex)
INNER JOIN mn ON (mn.mn = dn.mn)
INNER JOIN ap ON (ap.ap = mn.ap)
INNER JOIN dn AS dn2 ON (dn2.mn = dn.mn)
INNER JOIN dn AS dn3 ON (dn3.ex = dn2.ex)
INNER JOIN dn AS dnsrc ON (dnsrc.mn = dn3.mn)
INNER JOIN mn AS mnsrc ON (mnsrc.mn = dnsrc.mn)
INNER JOIN ap AS apsrc ON (apsrc.ap = mnsrc.ap)
INNER JOIN ex AS exsrc ON (exsrc.ex = dnsrc.ex)
INNER JOIN lv AS lvsrc ON (lvsrc.lv = exsrc.lv)
WHERE
dn.ex != dn2.ex AND
dn3.ex != dnsrc.ex AND
mn.ap != mnsrc.ap AND
dn.ex != dnsrc.ex AND
lcvc(lv.lc, lv.vc) IN ('zul-000') AND
lcvc(lvsrc.lc, lvsrc.vc) IN ('gle-000') AND
exsrc.tt IN ('doras')
GROUP BY ex.ex, dnsrc.ex
ORDER BY trq desc LIMIT 2000;

EXPLAIN ANALYZE with geqo on:


 QUERY PLAN

-
 Limit  (cost=3603.82..3603.82 rows=2 width=57) (actual
time=623.322..623.326 rows=31 loops=1)
   -  Sort  (cost=3603.82..3603.82 rows=2 width=57) (actual
time=623.321..623.321 rows=31 loops=1)
 Sort Key: (((uiuq_score(array_agg(ap.ui), array_agg(ap.uq)) +
uiuq_score(array_agg(apsrc.ui), array_agg(apsrc.uq))) / 2))
 Sort Method: quicksort  Memory: 27kB
 -  HashAggregate  (cost=3603.60..3603.82 rows=2 width=57) (actual
time=622.074..623.283 rows=31 loops=1)
   Group Key: ex.ex, dnsrc.ex
   -  Nested Loop  (cost=3075.23..3603.59 rows=2 width=57)
(actual time=8.023..620.761 rows=766 loops=1)
 -  Nested Loop  (cost=3075.17..3603.43 rows=2
width=55) (actual time=8.009..619.417 rows=766 loops=1)
   Join Filter: (mn.ap  mnsrc.ap)
   Rows Removed by Join Filter: 3793
   -  Hash Join  (cost=3075.08..3603.21 rows=2
width=63) (actual time=7.984..610.206 rows=4559 loops=1)
 Hash Cond: (dn.ex = ex.ex)
 -  Nested Loop  (cost=3.04..510.58
rows=20564 width=26) (actual time=0.918..509.574 rows=1047967 loops=1)
   Join Filter: (dn.ex  dnsrc.ex)
   Rows Removed by Join Filter: 5396
   -  Nested Loop  (cost=2.92..277.03
rows=118 width=22) (actual time=0.913..113.830 rows=40525 loops=1)
 -  Nested Loop
 (cost=2.87..267.76 rows=118 width=16) (actual time=0.899..60.170
rows=40525 loops=1)
   -  Hash Join
 (cost=2.76..266.08 rows=1 width=12) (actual time=0.878..5.057 rows=1254
loops=1)
 Hash Cond: (
exsrc.lv = lvsrc.lv)
 -  Nested Loop
 (cost=0.43..263.08 rows=871 width=16) (actual time=0.169..4.626 rows=1516
loops=1)
   -  Nested
Loop  (cost=0.34..164.79 rows=871 width=20) (actual time=0.138..1.180
rows=1516 loops=1)
 -
 Nested Loop  (cost=0.23..155.09 rows=5 width=12) (actual time=0.097..0.306
rows=43 loops=1)

 -  Index Scan using ex_tt_idx on ex exsrc  (cost=0.11..2.57 rows=2
width=8) (actual time=0.056..0.061 rows=5 loops=1)

 Index Cond: (tt = 'doras'::text)

 -  Index Scan using dn_ex_idx on dn dnsrc  (cost=0.11..75.55 rows=71
width=8) (actual time=0.027..0.046 rows=9 loops=5)

 Index Cond: (ex = exsrc.ex)
 -
 Index Only Scan using dn_mn_ex_key on dn dn3  (cost=0.11..1.13 rows=81
width=8) (actual time=0.008..0.015 rows=35 loops=43)

 Index Cond: (mn = dnsrc.mn)

 Filter: (ex  dnsrc.ex)

 Rows Removed by Filter: 1

 Heap Fetches: 0
   -  Index
Scan using mn_pkey on mn mnsrc  (cost=0.09..0.10 rows=1 width=8) (actual
time=0.002..0.002 rows=1 loops=1516)

Re: [HACKERS] Moving Pivotal's Greenplum work upstream

2015-06-14 Thread Craig Ringer
Has anything actually happened on the Greenplum release at this point?

All I can find are the same old press releases.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory Accounting v11

2015-06-14 Thread Jeff Davis
On Sun, 2015-06-14 at 16:21 -0400, Tom Lane wrote:
 Meh.  HashAgg could track its memory usage without loading the entire
 system with a penalty.

When I tried doing it outside of the MemoryContexts, it seemed to get
awkward quickly. I'm open to suggestion if you can point me in the right
direction. Maybe I can peek at the sizes of chunks holding state values
and group keys, and combine that with the hash table size-estimating
code?

   Moreover, this is about fourth or fifth down the
 list of the implementation problems with spilling hash aggregation to
 disk.  It would be good to see credible solutions for the bigger issues
 before we buy into adding overhead for a mechanism with no uses in core.

I had several iterations of a full implementation of the spill-to-disk
HashAgg patch[1]. Tomas Vondra has some constructive review comments,
but all of them seemed solvable. What do you see as a major unsolved
issue?

If I recall, you were concerned about things like array_agg, where an
individual state could get larger than work_mem. That's a valid concern,
but it's not the problem I was trying to solve.

Regards,
Jeff Davis

[1]
http://www.postgresql.org/message-id/1407706010.6623.16.camel@jeff-desktop



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating extension including dependencies

2015-06-14 Thread Pavel Stehule
+1

Is it working in runtime too?
 Dne 15.6.2015 0:51 napsal uživatel Petr Jelinek p...@2ndquadrant.com:

 Hi,

 I am getting tired installing manually required extensions manually. I was
 wondering if we might want to add option to CREATE SEQUENCE that would
 allow automatic creation of the extensions required by the extension that
 is being installed by the user.

 I also wrote some prototype patch that implements this.

 I call it prototype mainly because the syntax (CREATE EXTENSION ...
 RECURSIVE) could be improved, I originally wanted to do something like
 INCLUDING DEPENDENCIES but that need news (unreserved) keyword and I don't
 think it's worth it, plus it's wordy.

 --
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




[HACKERS] Time to get rid of PQnoPasswordSupplied?

2015-06-14 Thread Craig Ringer
Hi all

I frequently see users confused by one of our more common and less
clear error messages:

fe_sendauth: no password supplied

What this really means is that the server requested a password for md5
or cleartext authentication but no password was supplied to the client
and it cannot prompt for one in this context.

I'd like to get rid of it. It's clear others have wanted to in the
past, since it's a backward compat #define in libpq-fe.h :

/* Error when no password was given. */
/* Note: depending on this is deprecated; use PQconnectionNeedsPassword(). */
#define PQnoPasswordSuppliedfe_sendauth: no password supplied\n

but given the git blame for it:

4f9bf7fc (Tom Lane   2007-12-09 19:01:40 + 493) /* Note:
depending on this is deprecated; use PQconnectionNeedsPassword(). */

88fd162e (Bruce Momjian  2004-10-16 03:10:17 + 494) #define
PQnoPasswordSuppliedfe_sendauth: no password supplied\n

I'm wondering if it's time for this to go away, so we can have a
decent error message for this common error. It's been deprecated for
eight years.

How about:

The server requested a password but no password was supplied to the client

?

I'm sure it can be better than that, but ... well, it's not
fe_sendauth: no password supplied.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving Pivotal's Greenplum work upstream

2015-06-14 Thread Josh Berkus
On 06/14/2015 11:44 PM, Craig Ringer wrote:
 Has anything actually happened on the Greenplum release at this point?
 
 All I can find are the same old press releases.
 

Nothing has happened.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The Future of Aggregation

2015-06-14 Thread David Rowley
On 12 June 2015 at 23:57, David Rowley david.row...@2ndquadrant.com wrote:

 On 11 June 2015 at 01:39, Kevin Grittner kgri...@ymail.com wrote:


 One question that arose in my mind running this was whether might
 be able to combine sum(x) with count(*) if x was NOT NULL, even
 though the arguments don't match.  It might not be worth the
 gymnastics of recognizing the special case, and I certainly
 wouldn't recommend looking at that optimization in a first pass;
 but it might be worth jotting down on a list somewhere


 I think it's worth looking into that at some stage. I think I might have
 some of the code that would be required for the NULL checking over here -
 http://www.postgresql.org/message-id/CAApHDvqRB-iFBy68=dcgqs46arep7aun2pou4ktwl8kx9yo...@mail.gmail.com

 I'm just not so sure what the logic would be to decide when we could apply
 this. The only properties I can see that may be along the right lines are
 pg_proc.pronargs for int8inc and inc8inc_any.


Actually a possible realistic solution to this just came to me.

Add another property to pg_aggregate which stores the oid of an alternative
aggregate function which can be used when all arguments to the transition
function can be proved to be not null. This of course would be set to
InvalidOid in most cases.

But at least one good usage case would be COUNT(not_null_expr) would use
COUNT(*).

This likely could not be applied if DISTINCT or ORDER BY were used.

I'm not proposing that we go and do this. More analysis would have to be
done to see if the trade offs between extra code, extra planning time
produce a net win overall. The aggregate code has already gotten quite a
bit more complex over the past couple of releases.

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services