Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-10 Thread 德哥
HI,
My case is concurrency update one row(for exp 1000 client update the same row 
at the same time), and target is prevent waiting for waiters(quick return to 
client).
use advisory lock is a method, for quick return. but not good , must use 
function(to reduce consume between client-db network).
if update can skip locked in this case, performance can improve so much.


case exp (all session update the same row):
session a:
update tbl set x=x-1 where id=1 and x>0;
session b:
update tbl set x=x-1 where id=1 and x>0;
...
session x:
update tbl set x=x-1 where id=1 and x>0;


best regards,
digoal


At 2015-11-10 01:38:45, "Jeff Janes"  wrote:
>On Mon, Nov 9, 2015 at 9:06 AM, Tom Lane  wrote:
>> =?GBK?B?tcK45w==?=  writes:
>>>PostgreSQL 9.5 added skip locked to select for update to improve 
>>> concurrency performance, but why not add it to update sql?
>>
>> Seems like you'd have unpredictable results from the update then.
>
>But with use of RETURNING, you could at least know what those results
>were and so could deal with the unpredictability.
>
>I don't understand Digoal's use case (Google Translate does a poor job
>on the linked page), but this would be handy in conjunction with LIMIT
>(which also doesn't exist for UPDATE right now).
>
>update work_queue set assigned='Jeff' where assigned is null and
>skills_needed <@ '{whining,"breaking things"}'  limit 1 skip locked
>returning id, description
>
>In 9.5 you will be able to do it with a subselect, but putting them
>directly on the UPDATE would be easier to understand, and perhaps more
>efficient to execute.
>
>Cheers,
>

>Jeff






Re: [HACKERS] [COMMITTERS] pgsql: Translation updates

2015-11-10 Thread Alvaro Herrera
Peter Eisentraut wrote:
> Translation updates
> 
> Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
> Source-Git-Hash: cd263526676705b4a8a3a708c9842461c4a2bcc3

Hi Peter,

Would you please document this process?  I know it all starts with this
repository,
http://git.postgresql.org/gitweb/?p=pgtranslation/admin.git
and then cp-po-back is used to copy the files from the messages.git repo
to the postgresql.git repo, but it would be better to have more details.
Is there more to it?

(It would also be good to know how to create new branches and how to
destroy one when it goes out of support.)

I would do this by editing this page
https://wiki.postgresql.org/wiki/NLS/admin
and then adding a link to it from
https://wiki.postgresql.org/wiki/Release_process#Before-wrap_tasks
(I decided against doing it myself in case you have different ideas.)

Thanks

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2015-11-10 Thread Jesper Pedersen

Hi,

On 11/09/2015 05:10 PM, Andres Freund wrote:

Each graph has a full initdb + pgbench -i cycle now.


That looks about as we'd expect: the lock-free pinning doesn't matter
and ssynchronous commit is beneficial. I think our bottlenecks in write
workloads are sufficiently elsewhere that it's unlikely that buffer pins
make a lot of difference.



Using

 https://commitfest.postgresql.org/7/373/

shows that the CLog queue is max'ed out on the number of client connections.


You could try a readonly pgbench workload (i.e. -S), to see whether a
difference is visible there. For a pgbench -S workload it's more likely
that you only see significant contention on larger machines. If you've a
workload that touches more cached buffers, it'd be visible earlier.



Yeah, basically no difference between the 4 -S runs on this setup.


I know, I have a brown paper bag somewhere.


Why? This looks as expected, and the issues from the previous run were
easy to make mistakes?



I should have known to do the full cycle of initdb / pgbench -i in the 
first place.


Best regards,
 Jesper


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Table partition + join pushdown

2015-11-10 Thread Kouhei Kaigai
Hi, I put my comments towards the patch as follows.

Overall comments

* I think the enhancement in copyfuncs.c shall be in the separate
  patch; it is more graceful manner. At this moment, here is less
  than 20 Path delivered type definition. It is much easier works
  than entire Plan node support as we did recently.
  (How about other folk's opinion?)

* Can you integrate the attached test cases as regression test?
  It is more generic way, and allows people to detect problems
  if relevant feature gets troubled in the future updates.

* Naming of "join pushdown" is a bit misleading because other
  component also uses this term, but different purpose.
  I'd like to suggest try_pullup_append_across_join.
  Any ideas from native English speaker?

Patch review


At try_join_pushdown:
+   /* When specified outer path is not an AppendPath, nothing to do here. */
+   if (!IsA(outer_rel->cheapest_total_path, AppendPath))
+   {
+   elog(DEBUG1, "Outer path is not an AppendPath. Do nothing.");
+   return;
+   }
It checks whether the cheapest_total_path is AppendPath at the head
of this function. It ought to be a loop to walk on the pathlist of
RelOptInfo, because multiple path-nodes might be still alive but
also might not be cheapest_total_path.


+   switch (inner_rel->cheapest_total_path->pathtype)
+
Also, we can construct the new Append node if one of the path-node
within pathlist of inner_rel are at least supported.


+   if (list_length(inner_rel->ppilist) > 0)
+   {
+   elog(DEBUG1, "ParamPathInfo is already set in inner_rel. Can't 
pushdown.");
+   return;
+   }
+
You may need to introduce why this feature uses ParamPathInfos here.
It seems to me a good hack to attach additional qualifiers on the
underlying inner scan node, even if it is not a direct child of
inner relation.
However, people may have different opinion.

+static List *
+convert_parent_joinclauses_to_child(PlannerInfo *root, List *join_clauses,
+   RelOptInfo *outer_rel)
+{
+   Index   parent_relid =
+   find_childrel_appendrelinfo(root, outer_rel)->parent_relid;
+   List*clauses_parent = get_actual_clauses(join_clauses);
+   List*clauses_child = NIL;
+   ListCell*lc;
+
+   foreach(lc, clauses_parent)
+   {
+   Node*one_clause_child = (Node *) copyObject(lfirst(lc));
+
+   ChangeVarNodes(one_clause_child, parent_relid, outer_rel->relid, 0);
+   clauses_child = lappend(clauses_child, one_clause_child);
+   }
+
+   return make_restrictinfos_from_actual_clauses(root, clauses_child);
+}

Is ChangeVarNodes() right routine to replace var-node of parent relation
by relevant var-node of child relation?
It may look sufficient, however, nobody can ensure varattno of child
relation is identical with parent relation's one.
For example, which attribute number shall be assigned on 'z' here?
  CREATE TABLE tbl_parent(x int);
  CREATE TABLE tbl_child(y int) INHERITS(tbl_parent);
  ALTER TABLE tbl_parent ADD COLUMN z int;

--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4230,8 +4230,14 @@ prepare_sort_from_pathkeys(PlannerInfo *root, Plan 
*lefttree, List *pathkeys,
/*
 * Ignore child members unless they match the rel being
 * sorted.
+*
+* If this is called from make_sort_from_pathkeys(),
+* relids may be NULL. In this case, we must not ignore child
+* members because inner/outer plan of pushed-down merge join is
+* always child table.
 */
-   if (em->em_is_child &&
+   if (relids != NULL &&
+   em->em_is_child &&
!bms_equal(em->em_relids, relids))
continue;

It is a little bit hard to understand why this modification is needed.
Could you add source code comment that focus on the reason why.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Taiki Kondo
> Sent: Wednesday, October 21, 2015 8:07 PM
> To: Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI
> Cc: pgsql-hackers@postgresql.org; Yanagisawa Hiroshi(柳澤 博)
> Subject: Re: [HACKERS] [Proposal] Table partition + join pushdown
> 
> Hello, KaiGai-san and Horiguchi-san.
> 
> I created v2 patch. Please find attached.
> I believe this patch will fix the most of issues mentioned by
> Horiguchi-san except naming.
> 
> In this v2 patch, scan node which is originally inner relation of
> Join node must be SeqScan (or SampleScan). This limitation is
> due to implementation of try_join_pushdown(), which copies Path nodes
> to attach new filtering conditions converted from CHECK() constraints.
> 
> It uses copyObject() for this purpose, so I must 

Re: [HACKERS] Per-table log_autovacuum_min_duration is actually documented

2015-11-10 Thread Tom Lane
Michael Paquier  writes:
> While going through the release notes of 9.5 I noticed the following
> chunk in doc/src/sgml/release-9.5.sgml:
> Add per-table autovacuum logging control via
> log_min_autovacuum_duration (Michael Paquier)
> NOT DOCUMENTED?

Yeah ... I was going to do something about that on Sunday, until I got
sidetracked by the docs build blowing up on me ...

> This is actually documented in src/sgml/ref/create_table.sgml with the
> following paragraph so this mention in the release notes does not seem
> needed:
> log_autovacuum_min_duration, toast.log_autovacuum_min_duration (integer)
> Custom log_autovacuum_min_duration parameter.

Hmm ... that doesn't seem exactly transparent; what does "custom" mean?
Should it read "Overrides log_autovacuum_min_duration for autovacuum
operations on this specific table or toast table"?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation tweak for row-valued expressions and null

2015-11-10 Thread Jim Nasby

On 7/26/15 8:40 PM, Thomas Munro wrote:

I wonder if it might be worth adding a tiny note to the manual to
point out that the special logic for " IS [ NOT
] NULL" doesn't apply anywhere else that we handle nulls or talk about
[non]-null values in the manual.  See attached.


Yes. This is a common point of confusion.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh-oh: documentation PDF output no longer builds in HEAD

2015-11-10 Thread Magnus Hagander
On Tue, Nov 10, 2015 at 1:46 AM, Tom Lane  wrote:

> I wrote:
> > Curiously though, that gets us down to this:
> >  30615 strings out of 245828
> >  397721 string characters out of 1810780
> > which implies that indeed FlowObjectSetup *is* the cause of most of
> > the strings being entered.  I'm not sure how that squares with the
> > observation that there are less than 5000 \pagelabel entries in the
> > postgres-US.aux file.  Time for more digging.
>
> Well, after much digging, I've found what seems a workable answer.
> It turns out that the original form of FlowObjectSetup is just
> unbelievably awful when it comes to handling of hyperlink anchors:
> it will put a hyperlink anchor into the PDF for every "flow object",
> that is, everything in the document that could possibly have a link
> to it, whether or not it actually is linked to.  And aside from bloating
> the PDF file, it turns out that the hyperlink stuff also consumes some
> control sequence names, which is why we're running out of strings.
>
> There already is logic (probably way older than the hyperlink code)
> in jadetex to avoid generating page-number labels for objects that have
> no cross-references.  So what I did to fix this was to piggyback on
> that code: with the attached jadetex.cfg, both a page-number label
> and a hyperlink anchor will be generated for all and only those flow
> objects that have either a page-number reference or a hyperlink reference.
> (We could try to separate those things, but then we'd need two control
> sequence names not one per object for tracking purposes, and anyway many
> objects will have both kinds of reference if they have either.)
>
> This gets us down to ~135000 strings to build HEAD, and not incidentally,
> the resulting PDF is about half the size it was before.  I think I've
> also fixed a number of formerly unexplainable broken hyperlinks in the
> PDF; some are still broken, but they were that way before.  (It looks
> like  with endterm doesn't work very well in jadetex; all the
> remaining bad links seem to be associated with uses of that.)
>
> Barring objection I'll commit this tomorrow.  I'm inclined to back-patch
> it at least into 9.5, maybe further, because I'm afraid we may be closer
> than we realized to exceeding the strings limit in the back branches too.
>

Impressive, indeed.

When you say it's half the size - is that half the size of the preprocessed
PDF or is it also after the stuff we do on the website PDFs using
jpdftweak? IIRC that tweak is only there to deal with the size, and
specifically it deals with "bookmarks" which sounds a lot like this...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-10 Thread Adam Brightwell
> In commit 5d1ff6bd559ea8df I'd expected that the
> WARNINGs would certainly show up in regression test output, and I thought
> I'd verified that that was the case --- did that not happen for you?

I just doubled checked with both 'check' and 'check-world' and neither
seemed to have an issue with it.  Though, I do see the warning show up
in 'regress/log/postmaster.log'.

-Adam


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh-oh: documentation PDF output no longer builds in HEAD

2015-11-10 Thread Magnus Hagander
On Tue, Nov 10, 2015 at 6:15 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > When you say it's half the size - is that half the size of the
> preprocessed
> > PDF or is it also after the stuff we do on the website PDFs using
> > jpdftweak? IIRC that tweak is only there to deal with the size, and
> > specifically it deals with "bookmarks" which sounds a lot like this...
>
> I'm just looking at the size of the file produced by "make
> postgres-A4.pdf".
>
> I don't know anything about jpdftweak, but if it's being used to get rid
> of unreferenced hyperlink anchors, maybe we could dispense with that step
> after this goes in.
>
>
Yeah, that's what I was hoping.  You can see how it's used in the
mk-release-pdfs script on borka. It doesn't explain why we're doing it,
that's probably in the list archives somewhere, but it does show what we
do. So it should be easy enough to see if the benefit goes away :) (it
would also be nice for build times - that pdftweak step is very very slow)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-10 Thread Adam Brightwell
On Tue, Nov 10, 2015 at 9:18 AM, Adam Brightwell
 wrote:
>> I'm with Alvaro: the most interesting question here is why that mistake
>> did not blow up on you immediately.  I thought we had enough safeguards
>> in place to catch this type of error.
>
> Ok, I'll explore that a bit further as I was able to build and use
> with my hook without any issue. :-/

Ok, I have verified that it does indeed eventually raise a warning
about this.  It took a few for it to occur/appear in my logs. I was
expecting it to be more "immediate".  At any rate, I don't believe
there are any issues with the safeguards in place.

-Adam


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh-oh: documentation PDF output no longer builds in HEAD

2015-11-10 Thread Alvaro Herrera
Magnus Hagander wrote:
> On Tue, Nov 10, 2015 at 6:15 PM, Tom Lane  wrote:
> 
> > Magnus Hagander  writes:
> > > When you say it's half the size - is that half the size of the
> > preprocessed
> > > PDF or is it also after the stuff we do on the website PDFs using
> > > jpdftweak? IIRC that tweak is only there to deal with the size, and
> > > specifically it deals with "bookmarks" which sounds a lot like this...
> >
> > I'm just looking at the size of the file produced by "make
> > postgres-A4.pdf".
> >
> > I don't know anything about jpdftweak, but if it's being used to get rid
> > of unreferenced hyperlink anchors, maybe we could dispense with that step
> > after this goes in.
> >
> >
> Yeah, that's what I was hoping.  You can see how it's used in the
> mk-release-pdfs script on borka. It doesn't explain why we're doing it,
> that's probably in the list archives somewhere, but it does show what we
> do. So it should be easy enough to see if the benefit goes away :) (it
> would also be nice for build times - that pdftweak step is very very slow)

http://www.postgresql.org/message-id/flat/1284678175.2459.21.ca...@hp-laptop2.gunduz.org#1284678175.2459.21.ca...@hp-laptop2.gunduz.org

http://www.postgresql.org/message-id/flat/AANLkTi=3bkqc3ScM5Y==NPeY0_4uLFy+yGD9=gj-n...@mail.gmail.com#AANLkTi=3bkqc3ScM5Y==NPeY0_4uLFy+yGD9=gj-n...@mail.gmail.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-10 Thread Tom Lane
Adam Brightwell  writes:
> On Tue, Nov 10, 2015 at 9:18 AM, Adam Brightwell
>  wrote:
>>> I'm with Alvaro: the most interesting question here is why that mistake
>>> did not blow up on you immediately.  I thought we had enough safeguards
>>> in place to catch this type of error.

>> Ok, I'll explore that a bit further as I was able to build and use
>> with my hook without any issue. :-/

> Ok, I have verified that it does indeed eventually raise a warning
> about this.  It took a few for it to occur/appear in my logs. I was
> expecting it to be more "immediate".

Hm.  Salesforce had also run into the issue that the warnings are just
warnings and relatively easy to miss.  What we did there was to change
the elog(WARNING)s near the bottom of load_relcache_init_file to
elog(ERROR), but I'm not sure if I want to recommend doing that in the
community sources.  In commit 5d1ff6bd559ea8df I'd expected that the
WARNINGs would certainly show up in regression test output, and I thought
I'd verified that that was the case --- did that not happen for you?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh-oh: documentation PDF output no longer builds in HEAD

2015-11-10 Thread Tom Lane
Magnus Hagander  writes:
> When you say it's half the size - is that half the size of the preprocessed
> PDF or is it also after the stuff we do on the website PDFs using
> jpdftweak? IIRC that tweak is only there to deal with the size, and
> specifically it deals with "bookmarks" which sounds a lot like this...

I'm just looking at the size of the file produced by "make postgres-A4.pdf".

I don't know anything about jpdftweak, but if it's being used to get rid
of unreferenced hyperlink anchors, maybe we could dispense with that step
after this goes in.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] storage/buffer/README docs about buffer replacement are out of date

2015-11-10 Thread Alvaro Herrera
Andres Freund wrote:

> Now a) and b) are recent oversights of mine. I'd apparently not realized
> that there's detailed docs on this in buffer/README. But c) is pretty
> old - essentially 5d50873 from 2005.
> 
> I wonder if it's worthwhile to go into that level of detail - seems
> kinda likely to get out of date, as evidenced by it being outdated for
> ~10 years.

I think it makes sense to keep a high-level overview in the README; in
particular the description of how users of this API would use it should
be there.  But the implementation details should live in comments inside
the file.  I don't think the details of the buffer replacement algorithm
should be in the README.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh-oh: documentation PDF output no longer builds in HEAD

2015-11-10 Thread Tom Lane
Magnus Hagander  writes:
> On Tue, Nov 10, 2015 at 6:15 PM, Tom Lane  wrote:
>> I don't know anything about jpdftweak, but if it's being used to get rid
>> of unreferenced hyperlink anchors, maybe we could dispense with that step
>> after this goes in.

> Yeah, that's what I was hoping.  You can see how it's used in the
> mk-release-pdfs script on borka.

Hmm ... building current HEAD in A4 format, I get

HEADwith my patch

initially generated PDF:26.30MB 13.25MB
after jpdftweak:7.24MB  7.24MB

Evidently, jpdftweak *is* removing unreferenced bookmarks --- the output
file sizes would not be so close to identical otherwise.  But it's
evidently doing more than that.  The initially generated PDFs are
fairly compressible by "gzip", while jpdftweak's outputs are not, so
I suppose that the additional savings come from applying compression.
jdftweak's help output indicates that the "-ocs" options are selecting
aggressive compression.

I tried removing the load/save bookmarks steps from mk-release-pdfs,
but what I get is files that are a little smaller yet and again almost the
same size for either starting point; that probably means that jpdftweak's
default behavior is to strip all bookmarks :-(.

Maybe we could look around for another tool that just does PDF compression
and not the other stuff ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT documentation clean-up patch

2015-11-10 Thread Peter Geoghegan
As discussed on IM, I think we should make the insert.sgml
documentation even close to select.sgml than before, in that
parameters ought to be discussed in different sections, and not in one
large block. insert.sgml is too complicated for that approach now.

Attached revision (rebase) of your modified version of my patch (the
modification you provided privately) has the following notable
changes:

* New section for "Inserting" parameters. Now, "On Conflict Clause"
section is a subsection of the Standard Parameters' parent section (so
they're siblings).

This seemed like the best division of parameters here. It didn't seem
to make much sense to imagine that we ought to have multiple very
specific categories in the style of select.sgml (meaning that the old
insert text would have to be integrated with this new section
discussing "Insertion" parameters, I suppose) -- we didn't go far
enough in this direction before,  now but that would be too far IMV.

* The term "unique index inference clause" has been removed. Inference
is now something that conflict_target sometimes (or usually) does.
There is no clause that does inference that isn't exactly
conflict_target.

* As in my original, NOT DEFERRABLE constraints are the only type
supported -- we should not mention "deferred" constraints at all. You
changed that back. I changed it back again here.

* "ON CONFLICT Clause" section now mentions ON CONFLICT DO
UPDATE/NOTHING far sooner. I think this is far clearer.

* output_expression currently said to not project updated columns from
RETURNING, which is just wrong. This is fixed.

* General further copy-editing. Establishing the ON CONFLICT context
significantly improved the flow of discussing the new-to-9.5
parameters. I did more than I'd planned to here, but I think it's
shorter overall, and is certainly more consistent. I'd also say that
it reads better.

Thoughts?
-- 
Peter Geoghegan
From 7ea554388a9050db65fe23b606a4d34f2eeff9dd Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 10 Nov 2015 00:02:49 +0100
Subject: [PATCH] Improve ON CONFLICT documentation.

Author: Peter Geoghegan and Andres Freund
Discussion: cam3swzscpwzq-7ejc77vwqzz1go8gnmurq1qqdq3wrn7abw...@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced
---
 doc/src/sgml/ref/insert.sgml | 743 ++-
 1 file changed, 378 insertions(+), 365 deletions(-)

diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 8caf5fe..945eb69 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -99,7 +99,8 @@ INSERT INTO table_name [ AS 
You must have INSERT privilege on a table in
order to insert into it.  If ON CONFLICT DO UPDATE is
-   present the UPDATE privilege is also required.
+   present, UPDATE privilege on the table is also
+   required.
   
 
   
@@ -126,366 +127,378 @@ INSERT INTO table_name [ AS 
   Parameters
 
-  
-   
-with_query
-
- 
-  The WITH clause allows you to specify one or more
-  subqueries that can be referenced by name in the INSERT
-  query. See  and 
-  for details.
- 
- 
-  It is possible for the query
-  (SELECT statement)
-  to also contain a WITH clause.  In such a case both
-  sets of with_query can be referenced within
-  the query, but the
-  second one takes precedence since it is more closely nested.
- 
-
-   
-
-   
-table_name
-
- 
-  The name (optionally schema-qualified) of an existing table.
- 
-
-   
-
-   
-alias
-
- 
-  A substitute name for the target table. When an alias is provided, it
-  completely hides the actual name of the table.  This is particularly
-  useful when using ON CONFLICT DO UPDATE into a table
-  named excluded as that's also the name of the
-  pseudo-relation containing the proposed row.
- 
-
-   
-
-
-   
-column_name
-
- 
-  The name of a column in the table named by table_name.
-  The column name can be qualified with a subfield name or array
-  subscript, if needed.  (Inserting into only some fields of a
-  composite column leaves the other fields null.)  When
-  referencing a column with ON CONFLICT DO UPDATE, do
-  not include the table's name in the specification of a target
-  column.  For example, INSERT ... ON CONFLICT DO UPDATE
-  tab SET table_name.col = 1 is invalid (this follows the general
-  behavior for UPDATE).
- 
-
-   
-
-   
-DEFAULT VALUES
-
- 
-  All columns will be filled with their default values.
- 
-
-   
-
-   
-expression
-
- 
-  An expression or value to assign to the corresponding column.
- 
-
-   
-
-   
-DEFAULT
-
- 
-  The corresponding column will be filled with
-  its default value.
- 
-
-   
-
-   
-query
-
- 
-  A query (SELECT statement) that supplies the
-  

Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-10 Thread Tom Lane
Adam Brightwell  writes:
>> In commit 5d1ff6bd559ea8df I'd expected that the
>> WARNINGs would certainly show up in regression test output, and I thought
>> I'd verified that that was the case --- did that not happen for you?

> I just doubled checked with both 'check' and 'check-world' and neither
> seemed to have an issue with it.  Though, I do see the warning show up
> in 'regress/log/postmaster.log'.

I poked around a bit and figured out what is wrong: for shared catalogs,
this message would be emitted during RelationCacheInitializePhase2(),
which is executed before we perform client authentication (as indeed
is rather your point here).  That means ClientAuthInProgress is still
true, which means elog.c doesn't honor client_min_messages --- it
will only send ERROR or higher messages to the client.  So the message
goes to the postmaster log, but not to the client.

When I checked the behavior of 5d1ff6bd559ea8df, I must have only
tried it for unshared catalogs.  Those are set up by
RelationCacheInitializePhase3, which is post-authentication, so the
message comes out and causes regression test failures as expected.

This is kind of annoying :-(.  As noted in elog.c, it doesn't seem
like a terribly good idea to send WARNING messages while the client
is still in authentication mode; we can't be very sure that clients
will react desirably.  So we can't fix it by mucking with that.

One answer is to promote the case to an ERROR.  We could (probably) keep
a bad initfile from becoming a permanent lockout condition by unlinking
the initfile before reporting ERROR, but this way still seems like a
reliability hazard that could be worse than the original problem.

Another idea, which seems pretty grotty but might be workable, is
to save aside some state about the failure during
RelationCacheInitializePhase2 and then actually send the WARNING
during RelationCacheInitializePhase3.  But that seems a little Rube
Goldberg-ish, and who's to say it couldn't break?

But I don't think I want to just do nothing.  We already found out
how easy it is to not realize that we have a bug here, leading to
a serious backend-startup-performance issue.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-10 Thread Kouhei Kaigai



> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Joe Conway
> Sent: Tuesday, November 10, 2015 3:08 AM
> To: Craig Ringer; Adam Brightwell
> Cc: PostgreSQL Hackers
> Subject: Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization
> 
> On 11/08/2015 11:17 PM, Craig Ringer wrote:
> > On 9 November 2015 at 12:40, Adam Brightwell
> >  wrote:
> >> Hi All,
> >>
> >> While working on an auth hook, I found that I was unable to access the
> >> pg_shseclabel system table while processing the hook.  I discovered
> >> that the only tables that were bootstrapped and made available at this
> >> stage of the the auth process were pg_database, pg_authid and
> >> pg_auth_members.  Unfortunately, this is problematic if you have
> >> security labels that are associated with a role which are needed to
> >> determine auth decisions/actions.
> >>
> >> Given that the shared relations currently exposed can also have
> >> security labels that can be used for auth purposes, I believe it makes
> >> sense to make those available as well.  I have attached a patch that
> >> adds this functionality for review/discussion.  If this functionality
> >> makes sense I'll add it to the commitfest.
> >
> > Your reasoning certainly makes sense to me. I'm a little surprised
> > this didn't cause issues for SEPostgreSQL already.
> 
> Currently sepgsql at least does not support security labels on roles,
> even though nominally postgres does. If the label provider does not
> support them (as in sepgsql) you just get a "feature not supported" type
> of error when trying to create the label. I'm not sure if there are any
> other label providers in the wild other than sepgsql, but I should think
> they would all benefit from this change.
>
The sepgsql does not support and its security model intends to assign
client's privileges orthogonal to database role, thus, it didn't touch
pg_shseclabel at that time. However, we can easily expect another label
provider that references database role.

> +1 for adding to the next commitfest.
>
Me also.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-11-10 Thread Amit Langote
On 2015/10/29 23:22, Syed, Rahila wrote:
> Please find attached an updated patch.

A few more comments on v6:

>   relname = RelationGetRelationName(onerel);
> + schemaname = get_namespace_name(RelationGetNamespace(onerel));
>   ereport(elevel,
>   (errmsg("vacuuming \"%s.%s\"",
>   
> get_namespace_name(RelationGetNamespace(onerel)),
>   relname)));
> +snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", schemaname);
> + strcat(progress_message[0],".");
> + strcat(progress_message[0],relname);

How about the following instead -

+ snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s",
+   generate_relation_name(onerel));

>   if (next_not_all_visible_block >= SKIP_PAGES_THRESHOLD)
> + {
>   skipping_all_visible_blocks = true;
> + if(!scan_all)
> + total_heap_pages = total_heap_pages - 
> next_not_all_visible_block;
> + }
>   else
>   skipping_all_visible_blocks = false;

...

>*/
>   if (next_not_all_visible_block - blkno > 
> SKIP_PAGES_THRESHOLD)
> + {
>   skipping_all_visible_blocks = true;
> + if(!scan_all)
> + total_heap_pages = total_heap_pages - 
> (next_not_all_visible_block - blkno);
> + }

Fujii-san's review comment about these code blocks does not seem to be
addressed. He suggested to keep total_heap_pages fixed while adding number
of skipped pages to that of scanned pages. For that, why not add a
scanned_heap_pages variable instead of using vacrelstats->scanned_pages.

> + if (has_privs_of_role(GetUserId(), beentry->st_userid))
> + {
> + values[2] = 
> UInt32GetDatum(beentry->st_progress_param[0]);
> + values[3] = 
> UInt32GetDatum(beentry->st_progress_param[1]);
> + values[4] = 
> UInt32GetDatum(beentry->st_progress_param[2]);
> + values[5] = 
> UInt32GetDatum(beentry->st_progress_param[3]);
> + values[6] = UInt32GetDatum(total_pages);
> + values[7] = UInt32GetDatum(scanned_pages);
> +
> + if (total_pages != 0)
> + values[8] = Float8GetDatum(scanned_pages * 100 
> / total_pages);
> + else
> + nulls[8] = true;
> + }
> + else
> + {
> + values[2] = CStringGetTextDatum(" privilege>");
> + nulls[3] = true;
> + nulls[4] = true;
> + nulls[5] = true;
> + nulls[6] = true;
> + nulls[7] = true;
> + nulls[8] = true;
> + }

This is most likely not correct, that is, putting a text datum into
supposedly int4 column. I see this when I switch to a unprivileged user:

pgbench=# \x
pgbench=# \c - other
pgbench=> SELECT * FROM pg_stat_vacuum_progress;
-[ RECORD 1 ]---+
pid | 20395
table_name  | public.pgbench_accounts
total_heap_pages| 44895488
scanned_heap_pages  |
total_index_pages   |
scanned_index_pages |
total_pages |
scanned_pages   |
percent_complete|

I'm not sure if applying the privilege check for columns of
pg_stat_vacuum_progress is necessary, but I may be wrong.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Improvements of Hunspell dictionaries support

2015-11-10 Thread Artur Zakirov

08.11.2015 14:23, Artur Zakirov пишет:

Thank you for reply.

This was because of the flag field size of the SPELL struct. And long
flags were being trancated in the .dict file.

I attached new patch. It is temporary patch, not final. It can be done
better.



I have updated the patch and attached it. Now dynamic memory allocation 
is used to the flag field of the SPELL struct.


I have valued time of a dictionary loading and memory using by a 
dictionary in the new patch. Dictionary is loaded at the first reference 
to it. For example, if we execute ts_lexize function. And first 
ts_lexize executing takes more time than second.


The following table shows performance of some dictionaries before patch 
and after in my computer.


-
|  |  loading time, ms  |  memory, MB   |
|  |  before  |  after  |  before |  after  |
-
|ar| 700  | 300 | 23,7| 15,7|
|br_fr | 410  | 450 | 27,4| 27,5|
|ca| 248  | 245 | 14,7| 15,4|
|en_us | 100  | 100 | 5,4 | 6,2 |
|fr| 160  | 178 | 13,7| 14,1|
|gl_es | 160  | 150 | 9   | 9,4 |
|is| 260  | 202 | 16,1| 16,3|
-

As you can see, substantially loading time and memory using before and 
after the patch are same.


Link to patch in commitfest:
https://commitfest.postgresql.org/8/420/

Link to regression tests:
https://dl.dropboxusercontent.com/u/15423817/HunspellDictTest.tar.gz

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
*** a/src/backend/tsearch/spell.c
--- b/src/backend/tsearch/spell.c
***
*** 153,159  cmpspell(const void *s1, const void *s2)
  static int
  cmpspellaffix(const void *s1, const void *s2)
  {
! 	return (strncmp((*(SPELL *const *) s1)->p.flag, (*(SPELL *const *) s2)->p.flag, MAXFLAGLEN));
  }
  
  static char *
--- 153,159 
  static int
  cmpspellaffix(const void *s1, const void *s2)
  {
! 	return (strncmp((*(SPELL *const *) s1)->flag, (*(SPELL *const *) s2)->flag, MAXFLAGLEN));
  }
  
  static char *
***
*** 237,242  cmpaffix(const void *s1, const void *s2)
--- 237,309 
  	   (const unsigned char *) a2->repl);
  }
  
+ static unsigned short
+ decodeFlag(IspellDict *Conf, char *sflag, char **sflagnext)
+ {
+ 	unsigned short	s;
+ 	char		   *next;
+ 
+ 	switch (Conf->flagMode)
+ 	{
+ 		case FM_LONG:
+ 			s = (int)sflag[0] << 8 | (int)sflag[1];
+ 			if (sflagnext)
+ *sflagnext = sflag + 2;
+ 			break;
+ 		case FM_NUM:
+ 			s = (unsigned short) strtol(sflag, , 10);
+ 			if (sflagnext)
+ 			{
+ if (next)
+ {
+ 	*sflagnext = next;
+ 	while (**sflagnext)
+ 	{
+ 		if (**sflagnext == ',')
+ 		{
+ 			*sflagnext = *sflagnext + 1;
+ 			break;
+ 		}
+ 		*sflagnext = *sflagnext + 1;
+ 	}
+ }
+ else
+ 	*sflagnext = 0;
+ 			}
+ 			break;
+ 		default:
+ 			s = (unsigned short) *((unsigned char *)sflag);
+ 			if (sflagnext)
+ *sflagnext = sflag + 1;
+ 	}
+ 
+ 	return s;
+ }
+ 
+ static bool
+ isAffixFlagInUse(IspellDict *Conf, int affix, unsigned short affixflag)
+ {
+ 	char *flagcur;
+ 	char *flagnext = 0;
+ 
+ 	if (affixflag == 0)
+ 		return true;
+ 
+ 	flagcur = Conf->AffixData[affix];
+ 
+ 	while (*flagcur)
+ 	{
+ 		if (decodeFlag(Conf, flagcur, ) == affixflag)
+ 			return true;
+ 		if (flagnext)
+ 			flagcur = flagnext;
+ 		else
+ 			break;
+ 	}
+ 
+ 	return false;
+ }
+ 
  static void
  NIAddSpell(IspellDict *Conf, const char *word, const char *flag)
  {
***
*** 255,261  NIAddSpell(IspellDict *Conf, const char *word, const char *flag)
  	}
  	Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1);
  	strcpy(Conf->Spell[Conf->nspell]->word, word);
! 	strlcpy(Conf->Spell[Conf->nspell]->p.flag, flag, MAXFLAGLEN);
  	Conf->nspell++;
  }
  
--- 322,328 
  	}
  	Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1);
  	strcpy(Conf->Spell[Conf->nspell]->word, word);
! 	Conf->Spell[Conf->nspell]->flag = (*flag != '\0') ? cpstrdup(Conf, flag) : VoidString;
  	Conf->nspell++;
  }
  
***
*** 355,361  FindWord(IspellDict *Conf, const char *word, int affixflag, int flag)
  	else if ((flag & StopMiddle->compoundflag) == 0)
  		return 0;
  
! 	if ((affixflag == 0) || (strchr(Conf->AffixData[StopMiddle->affix], affixflag) != NULL))
  		return 1;
  }
  node = StopMiddle->node;
--- 422,428 
  	else if ((flag & StopMiddle->compoundflag) == 0)
  		return 0;
  
! 	if (isAffixFlagInUse(Conf, StopMiddle->affix, affixflag))
  		return 1;
  }
  node = StopMiddle->node;
***
*** 394,400  NIAddAffix(IspellDict *Conf, int flag, char flagflags, const char *mask, const c
  
  	Affix = Conf->Affix + Conf->naffixes;
  
! 	if 

Re: [HACKERS] Uh-oh: documentation PDF output no longer builds in HEAD

2015-11-10 Thread Robert Haas
On Mon, Nov 9, 2015 at 7:46 PM, Tom Lane  wrote:
> I wrote:
>> Curiously though, that gets us down to this:
>>  30615 strings out of 245828
>>  397721 string characters out of 1810780
>> which implies that indeed FlowObjectSetup *is* the cause of most of
>> the strings being entered.  I'm not sure how that squares with the
>> observation that there are less than 5000 \pagelabel entries in the
>> postgres-US.aux file.  Time for more digging.
>
> Well, after much digging, I've found what seems a workable answer.
> It turns out that the original form of FlowObjectSetup is just
> unbelievably awful when it comes to handling of hyperlink anchors:
> it will put a hyperlink anchor into the PDF for every "flow object",
> that is, everything in the document that could possibly have a link
> to it, whether or not it actually is linked to.  And aside from bloating
> the PDF file, it turns out that the hyperlink stuff also consumes some
> control sequence names, which is why we're running out of strings.
>
> There already is logic (probably way older than the hyperlink code)
> in jadetex to avoid generating page-number labels for objects that have
> no cross-references.  So what I did to fix this was to piggyback on
> that code: with the attached jadetex.cfg, both a page-number label
> and a hyperlink anchor will be generated for all and only those flow
> objects that have either a page-number reference or a hyperlink reference.
> (We could try to separate those things, but then we'd need two control
> sequence names not one per object for tracking purposes, and anyway many
> objects will have both kinds of reference if they have either.)
>
> This gets us down to ~135000 strings to build HEAD, and not incidentally,
> the resulting PDF is about half the size it was before.  I think I've
> also fixed a number of formerly unexplainable broken hyperlinks in the
> PDF; some are still broken, but they were that way before.  (It looks
> like  with endterm doesn't work very well in jadetex; all the
> remaining bad links seem to be associated with uses of that.)
>
> Barring objection I'll commit this tomorrow.  I'm inclined to back-patch
> it at least into 9.5, maybe further, because I'm afraid we may be closer
> than we realized to exceeding the strings limit in the back branches too.

I am in awe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-table log_autovacuum_min_duration is actually documented

2015-11-10 Thread Michael Paquier
On Wed, Nov 11, 2015 at 12:07 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> This is actually documented in src/sgml/ref/create_table.sgml with the
>> following paragraph so this mention in the release notes does not seem
>> needed:
>> log_autovacuum_min_duration, toast.log_autovacuum_min_duration (integer)
>> Custom log_autovacuum_min_duration parameter.
>
> Hmm ... that doesn't seem exactly transparent; what does "custom" mean?

In this context, "custom" means a per-table value that is used instead
of the server-wide value if defined. Its toast equivalent uses the
per-table value if not defined explicitly. The other autovacuum
parameters are using the same formulation.

> Should it read "Overrides log_autovacuum_min_duration for autovacuum
> operations on this specific table or toast table"?

The same applied for all the other autovacuum and autoanalyze
parameters. Do you think that we should add in the top paragraph of
section "Storage Parameters" a mention of the type "If this parameter
has a server-wide equivalent parameter, the per-table value overrides
its server-wide equivalent if defined" or similar.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Error in char(n) example in documentation

2015-11-10 Thread Peter Geoghegan
I think that this example in the docs [1] is wrong:

"""
Values of type character are physically padded with spaces to the
specified width n, and are stored and displayed that way. However,
trailing spaces are treated as semantically insignificant and
disregarded when comparing two values of type character. In collations
where whitespace is significant, this behavior can produce unexpected
results, e.g. SELECT 'a '::CHAR(2) collate "C" < 'a\n'::CHAR(2)
returns true.
"""

Now, it is the case that the SQL statement will yield true:

postgres=# SELECT 'a '::CHAR(2) collate "C" < 'a\n'::CHAR(2);
 ?column?
--
 t
(1 row)

However, so does the same formulation with varchar, even though this
example is motivated by showing how char(n) differs from other
character types like varchar(n):

postgres=# SELECT 'a '::VARCHAR(2) collate "C" < 'a\n'::VARCHAR(2);
 ?column?
--
 t
(1 row)

I believe that the problem is that commit 8457d0bec did not correctly
transcribe the string constant with C-like escape notation from a C
code comment that it removed as redundant (it was roughly the same
example -- guess the simplification of the example went too far).
Attached patch fixes the documentation. When an "E" is added so the
C-like escape is correctly interpreted, the char(n) example continues
to yield true:

postgres=# SELECT 'a '::CHAR(2) collate "C" < E'a\n'::CHAR(2);
 ?column?
--
 t
(1 row)

But changing the cast to varchar will now yield a different result
(perhaps the intuitive result to those not familiar with char(n)
semantics):

postgres=# SELECT 'a '::VARCHAR(2) collate "C" < E'a\n'::VARCHAR(2);
 ?column?
--
 f
(1 row)

[1] http://www.postgresql.org/docs/devel/static/datatype-character.html
-- 
Peter Geoghegan
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 4d883ec..2456a50 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1102,7 +1102,7 @@ SELECT '52093.89'::money::numeric::float8;
 of type character.  In collations where whitespace
 is significant, this behavior can produce unexpected results,
 e.g. SELECT 'a '::CHAR(2) collate "C" 
-'a\n'::CHAR(2) returns true.
+E'a\n'::CHAR(2) returns true.
 Trailing spaces are removed when converting a character value
 to one of the other string types.  Note that trailing spaces
 are semantically significant in

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2015-11-10 Thread Andres Freund
On 2015-09-10 17:15:26 +0200, Fabien COELHO wrote:
> Here is a v13, which is just a rebase after 1aba62ec.

And here's v14. It's not something entirely ready. A lot of details have
changed, I unfortunately don't remember them all. But there are more
important things than the details of the patch.

I've played *a lot* with this patch. I found a bunch of issues:

1) The FileFlushContext context infrastructure isn't actually
   correct. There's two problems: First, using the actual 'fd' number to
   reference a to-be-flushed file isn't meaningful. If there  are lots
   of files open, fds get reused within fd.c. That part is enough fixed
   by referencing File instead the fd. The bigger problem is that the
   infrastructure doesn't deal with files being closed. There can, which
   isn't that hard to trigger, be smgr invalidations causing smgr handle
   and thus the file to be closed.

   I think this means that the entire flushing infrastructure actually
   needs to be hoisted up, onto the smgr/md level.

2) I noticed that sync_file_range() blocked far more often than I'd
   expected. Reading the kernel code that turned out to be caused by a
   pessimization in the kernel introduced years ago - in many situation
   SFR_WRITE waited for the writes. A fix for this will be in the 4.4
   kernel.

3) I found that latency wasn't improved much for workloads that are
   significantly bigger than shared buffers. The problem here is that
   neither bgwriter nor the backends have, so far, done
   sync_file_range() calls. That meant that the old problem of having
   gigabytes of dirty data that periodically get flushed out, still
   exists. Having these do flushes mostly attacks that problem.


Benchmarking revealed that for workloads where the hot data set mostly
fits into shared buffers flushing and sorting is anywhere from a small
to a massive improvement, both in throughput and latency. Even without
the patch from 2), although fixing that improves things furhter.



What I did not expect, and what confounded me for a long while, is that
for workloads where the hot data set does *NOT* fit into shared buffers,
sorting often led to be a noticeable reduction in throughput. Up to
30%. The performance was still much more regular than before, i.e. no
more multi-second periods without any transactions happening.

By now I think I know what's going on: Before the sorting portion of the
patch the write-loop in BufferSync() starts at the current clock hand,
by using StrategySyncStart(). But after the sorting that obviously
doesn't happen anymore - buffers are accessed in their sort order. By
starting at the current clock hand and moving on from there the
checkpointer basically makes it more less likely that victim buffers
need to be written either by the backends themselves or by
bgwriter. That means that the sorted checkpoint writes can, indirectly,
increase the number of unsorted writes by other processes :(

My benchmarking suggest that that effect is the larger, the shorter the
checkpoint timeout is. That seems to intuitively make sense, give the
above explanation attempt. If the checkpoint takes longer the clock hand
will almost certainly soon overtake checkpoints 'implicit' hand.

I'm not sure if we can really do anything about this problem. While I'm
pretty jet lagged, I still spent a fair amount of time thinking about
it. Seems to suggest that we need to bring back the setting to
enable/disable sorting :(


What I think needs to happen next with the patch is:
1) Hoist up the FileFlushContext stuff into the smgr layer. Carefully
   handling the issue of smgr invalidations.
2) Replace the boolean checkpoint_flush_to_disk GUC with a list guc that
   later can contain multiple elements like checkpoint, bgwriter,
   backends, ddl, bulk-writes. That seems better than adding GUCs for
   these separately. Then make the flush locations in the patch
   configurable using that.
3) I think we should remove the sort timing from the checkpoint logging
   before commit. It'll always be pretty short.


Greetings,

Andres Freund
>From dd0868d2c714bf18d34f82db40669b435d4b2ba2 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 23 Oct 2015 15:22:04 +0200
Subject: [PATCH] ckpt-14-andres

---
 doc/src/sgml/config.sgml  |  18 ++
 doc/src/sgml/wal.sgml |  12 +
 src/backend/access/heap/rewriteheap.c |   2 +-
 src/backend/access/nbtree/nbtree.c|   2 +-
 src/backend/access/nbtree/nbtsort.c   |   2 +-
 src/backend/access/spgist/spginsert.c |   6 +-
 src/backend/access/transam/xlog.c |  11 +-
 src/backend/storage/buffer/README |   5 -
 src/backend/storage/buffer/buf_init.c |  24 +-
 src/backend/storage/buffer/bufmgr.c   | 365 ++
 src/backend/storage/buffer/freelist.c |   6 +-
 src/backend/storage/buffer/localbuf.c |   3 +-
 src/backend/storage/file/buffile.c 

Re: [HACKERS] Proposal: Trigonometric functions in degrees

2015-11-10 Thread Michael Paquier
On Tue, Nov 10, 2015 at 11:17 PM, Michael Paquier 
wrote:
> On Sun, Nov 1, 2015 at 9:34 PM, Dean Rasheed 
wrote:
>> On 27 October 2015 at 08:24, Dean Rasheed 
wrote:
>>> I think it's still feasible to have sind(30) = 0.5 exactly and keep
>>> monotonicity
>>>
>>
>> Here's a patch along those lines. It turned out to be fairly
>> straightforward. It's still basically a thin wrapper on top of the
>> math library trig functions, with a bit of careful scaling to ensure
>> that curves join together to form continuous functions that are
>> monotonic in the expected regions and return exact values in all the
>> special cases 0,30,45,60,90,...
>>
>> I also modified some of the CHECKFLOATVAL() checks which didn't look
>> right to me, unless there's some odd platform-specific behaviour that
>> I'm not aware of, functions like sin and asin should never return
>> infinity.

-   CHECKFLOATVAL(result, isinf(arg1), true);
+   CHECKFLOATVAL(result, false, true);
PG_RETURN_FLOAT8(result);

Hm. I would let them as-is, and update your patch to do the similar checks
in the new functions introduced. See f9ac414 from 2007 which is the result
of the following thread:
http://www.postgresql.org/message-id/200612271844.kbriivb18...@momjian.us
It doesn't seem wise to be backward regarding those Inf/NaN checks.

> The alternative expected outputs for test float8 need to be updated,
> this is causing regression failures particularly on win32 where 3
> digits are used for exponentials and where tan('NaN') actually results
> in an ERROR. See for example the attached regressions.diffs.

It would be nice to split the results specific to NaN and Infinity into
separate queries. The test for arctangent is one where things should be
splitted.

c5e86ea took some of the OIDs of this previous patch, so I rebased it as
attached.

result = 1.0 / result;
-   CHECKFLOATVAL(result, true /* cotan(pi/2) == inf */ , true);
+   CHECKFLOATVAL(result, true /* cot(0) == Inf */ , true);
PG_RETURN_FLOAT8(result);
This one is true. it could be corrected as an independent fix.

+   if (isinf(arg1) || arg1 < -1 || arg1 > 1)
+   ereport(ERROR,
+
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+errmsg("input is out of range")));
This should check on -1.0 and 1.0?

+   if (arg1 > 90)
+   {
+   /* tand(180-x) = -tand(x) */
+   arg1 = 180 - arg1;
+   sign = -sign;
+   }
Similarly the series of checks in atand, dcosd, asind, should use .0
precision points? Same remark for other code paths like cosd_0_to_60 for
example.

+   if (arg1 > 180)
+   {
+   /* tand(360-x) = -tand(x) */
+arg1 = 360 - arg1;
+   sign = -sign;
+   }
Picky detail: be careful of incorrect tab spaces.

=# select acos(-1.1);
 acos
--
  NaN
(1 row)
=# select acosd(-1.1);
ERROR:  22003: input is out of range
LOCATION:  dacosd, float.c:1752
Some results are inconsistent, it seems that we should return NaN in the
second case instead of an error.

I had as well a look at the monotony of those functions, using rough
queries like this one, and things are looking nice. The precise values are
taken into account and their surroundings are monotone.
with degrees as (
select generate_series(89.8, 90.2, 0.1)
union all select generate_series(44.8, 45.2, 0.1)
union all select generate_series(29.8, 30.2, 0.1)
union all select generate_series(-0.2, 0.2, 0.1)
union all select generate_series(59.8, 60.2, 0.1))
SELECT x, cosd(x), sind(x), tand(x) FROM degrees as deg(x);
with degrees as (
select generate_series((sqrt(3) / 3 - 0.1)::numeric, (sqrt(3) / 3 +
0.1)::numeric, 0.01)
union all select generate_series((sqrt(3) / 2 - 0.1)::numeric, (sqrt(3)
/ 2 + 0.1)::numeric, 0.01)
union all select generate_series(0.5 - 0.1, 0.5 + 0.1, 0.01)
union all select generate_series(0, 0.1, 0.01)
union all select generate_series(0.9, 1, 0.01))
select x, acosd(x), asind(x), atand(x) from degrees as deg(x);
Attached are the results of all those things if others want to have a look.
Regards,
-- 
Michael


degrees.sql
Description: Binary data
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 60b9a09..3716210 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -993,9 +993,9 @@
Finally,  shows the
available trigonometric functions.  All trigonometric functions
take arguments and return values of type double
-   precision. Trigonometric functions arguments are expressed
-   in radians. Inverse functions return values are expressed in
-   radians.  See unit transformation functions
+   precision.  Each of the trigonometric functions comes in
+   two varieties, one which 

Re: [HACKERS] Dangling Client Backend Process

2015-11-10 Thread Michael Paquier
On Wed, Nov 4, 2015 at 2:18 AM, Robert Haas wrote:
>
> The second conclusion does not appear to be correct.  parseInput()
> will call pqParseInput3() or pqParseInput2(), either of which will
> handle an error as if it were a notice - i.e. by printing it out.

Right per pqGetErrorNotice3 when the connection is in PGASYNC_IDLE state.

> Here's a patch based on that analysis, addressing just that one
> function, not any of the other changes talked about on this thread.
> Does this make sense?  Would we want to back-patch it, and if so how
> far, or just adjust master?  My gut is just master, but I don't know
> why this issue wouldn't also affect Hot Standby kills and maybe other
> kinds of connection termination situations, so maybe there's an
> argument for back-patching.  On the third hand, failing to read the
> error message off of a just-terminated connection isn't exactly a
> crisis of the first order either.

Looks sane to me. As the connection is in PGASYNC idle state when
crossing the path of pqHandleSendFailure() we would finish eating up
all the error messages received from server and print an internal
notice for the rest with "message type blah received from server while
idle. Based on the lack of complaints regarding libpq on this side, I
would just go for master, as for 9.5 is pretty late in this game to
put some dust on it before a potential backpatch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Per-table log_autovacuum_min_duration is actually documented

2015-11-10 Thread Michael Paquier
Hi all,

While going through the release notes of 9.5 I noticed the following
chunk in doc/src/sgml/release-9.5.sgml:
Add per-table autovacuum logging control via
log_min_autovacuum_duration (Michael Paquier)
NOT DOCUMENTED?

This is actually documented in src/sgml/ref/create_table.sgml with the
following paragraph so this mention in the release notes does not seem
needed:
log_autovacuum_min_duration, toast.log_autovacuum_min_duration (integer)
Custom log_autovacuum_min_duration parameter.

I recall that we had some talks about grouping all the relopts into a
single documentation section, perhaps not having one is at the origin
of the confusion?
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Trigonometric functions in degrees

2015-11-10 Thread Michael Paquier
On Sun, Nov 1, 2015 at 9:34 PM, Dean Rasheed  wrote:
> On 27 October 2015 at 08:24, Dean Rasheed  wrote:
>> I think it's still feasible to have sind(30) = 0.5 exactly and keep
>> monotonicity
>>
>
> Here's a patch along those lines. It turned out to be fairly
> straightforward. It's still basically a thin wrapper on top of the
> math library trig functions, with a bit of careful scaling to ensure
> that curves join together to form continuous functions that are
> monotonic in the expected regions and return exact values in all the
> special cases 0,30,45,60,90,...
>
> I also modified some of the CHECKFLOATVAL() checks which didn't look
> right to me, unless there's some odd platform-specific behaviour that
> I'm not aware of, functions like sin and asin should never return
> infinity.

The alternative expected outputs for test float8 need to be updated,
this is causing regression failures particularly on win32 where 3
digits are used for exponentials and where tan('NaN') actually results
in an ERROR. See for example the attached regressions.diffs.
-- 
Michael


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-10 Thread Adam Brightwell
> I'm with Alvaro: the most interesting question here is why that mistake
> did not blow up on you immediately.  I thought we had enough safeguards
> in place to catch this type of error.

Ok, I'll explore that a bit further as I was able to build and use
with my hook without any issue. :-/

-Adam


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-10 Thread Adam Brightwell
>> +1 for adding to the next commitfest.
>>
> Me also.

Done.

-Adam


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-10 Thread Thomas Munro
Hi hackers,

Many sites use hot standby servers to spread read-heavy workloads over more
hardware, or at least would like to.  This works well today if your
application can tolerate some time lag on standbys.  The problem is that
there is no guarantee of when a particular commit will become visible for
clients connected to standbys.  The existing synchronous commit feature is
no help here because it guarantees only that the WAL has been flushed on
another server before commit returns.  It says nothing about whether it has
been applied or whether it has been applied on the standby that you happen
to be talking to.

A while ago I posted a small patch[1] to allow synchronous_commit to wait
for remote apply on the current synchronous standby, but (as Simon Riggs
rightly pointed out in that thread) that part isn't the main problem.  It
seems to me that the main problem for a practical 'writer waits' system is
how to support a dynamic set of servers, gracefully tolerating failures and
timing out laggards, while also providing a strong guarantee during any
such transitions.  Now I would like to propose something to do that, and
share a proof-of-concept patch.


=== PROPOSAL ===

The working name for the proposed feature is "causal reads", because it
provides a form of "causal consistency"[2] (and "read-your-writes"
consistency) no matter which server the client is connected to.  There is a
similar feature by the same name in another product (albeit implemented
differently -- 'reader waits'; more about that later).  I'm not wedded to
the name.

The feature allows arbitrary read-only transactions to be run on any hot
standby, with a specific guarantee about the visibility of preceding
transactions.  The guarantee is that if you set a new GUC "causal_reads =
on" in any pair of consecutive transactions (tx1, tx2) where tx2 begins
after tx1 successfully returns, then tx2 will either see tx1 or fail with a
new error "standby is not available for causal reads", no matter which
server it runs on.  A discovery mechanism is also provided, giving an
instantaneous snapshot of the set of standbys that are currently available
for causal reads (ie won't raise the error), in the form of a new column in
pg_stat_replication.

For example, a web server might run tx1 to insert a new row representing a
message in a discussion forum on the primary server, and then send the user
to another web page that runs tx2 to load all messages in the forum on an
arbitrary hot standby server.  If causal_reads = on in both tx1 and tx2
(for example, because it's on globally), then tx2 is guaranteed to see the
new post, or get a (hopefully rare) error telling the client to retry on
another server.

Very briefly, the approach is:
1.  The primary tracks apply lag on each standby (including between
commits).
2.  The primary deems standbys 'available' for causal reads if they are
applying WAL and replying to keepalives fast enough, and periodically sends
the standby an authorization to consider itself available for causal reads
until a time in the near future.
3.  Commit on the primary with "causal_reads = on" waits for all
'available' standbys either to apply the commit record, or to cease to be
'available' and begin raising the error if they are still alive (because
their authorizations have expired).
4.  Standbys can start causal reads transactions only while they have an
authorization with an expiry time in the future; otherwise they raise an
error when an initial snapshot is taken.

In a follow-up email I can write about the design trade-offs considered
(mainly 'writer waits' vs 'reader waits'), comparison with some other
products, method of estimating replay lag, wait and timeout logic and how
it maintains the guarantee in various failure scenarios, logic for standbys
joining and leaving, implications of system clock skew between servers, or
any other questions you may have, depending on feedback/interest (but see
comments in the attached patch for some of those subjects).  For now I
didn't want to clog up the intertubes with too large a wall of text.


=== PROOF-OF-CONCEPT ===

Please see the POC patch attached.  It adds two new GUCs.  After setting up
one or more hot standbys as per usual, simply add "causal_reads_timeout =
4s" to the primary's postgresql.conf and restart.  Now, you can set
"causal_reads = on" in some/all sessions to get guaranteed causal
consistency.  Expected behaviour: the causal reads guarantee is maintained
at all times, even when you overwhelm, kill, crash, disconnect, restart,
pause, add and remove standbys, and the primary drops them from the set it
waits for in a timely fashion.  You can monitor the system with the
replay_lag and causal_reads_status in pg_stat_replication and some state
transition LOG messages on the primary.  (The patch also supports
"synchronous_commit = apply", but it's not clear how useful that is in
practice, as already discussed.)

Lastly, a few notes about how this feature related to some other 

[HACKERS] Re: Multixact slru doesn't don't force WAL flushes in SlruPhysicalWritePage()

2015-11-10 Thread Noah Misch
On Mon, Nov 09, 2015 at 10:40:07PM +0100, Andres Freund wrote:
>   /*
>* Optional array of WAL flush LSNs associated with entries in the SLRU
>* pages.  If not zero/NULL, we must flush WAL before writing pages 
> (true
>* for pg_clog, false for multixact, pg_subtrans, pg_notify).  
> group_lsn[]
>* has lsn_groups_per_page entries per buffer slot, each containing the
>* highest LSN known for a contiguous group of SLRU entries on that 
> slot's
>* page.
>*/
>   XLogRecPtr *group_lsn;
>   int lsn_groups_per_page;
> 
> Uhm. multixacts historically didn't need to follow the
> write-WAL-before-data rule because it was zapped at restart. But it's
> now persistent.
> 
> There are no comments about this choice anywhere in multixact.c, leading
> me to believe that this was not an intentional decision.

Here's the multixact.c comment justifying it:

 * XLOG interactions: this module generates an XLOG record whenever a new
 * OFFSETs or MEMBERs page is initialized to zeroes, as well as an XLOG record
 * whenever a new MultiXactId is defined.  This allows us to completely
 * rebuild the data entered since the last checkpoint during XLOG replay.
 * Because this is possible, we need not follow the normal rule of
 * "write WAL before data"; the only correctness guarantee needed is that
 * we flush and sync all dirty OFFSETs and MEMBERs pages to disk before a
 * checkpoint is considered complete.  If a page does make it to disk ahead
 * of corresponding WAL records, it will be forcibly zeroed before use anyway.
 * Therefore, we don't need to mark our pages with LSN information; we have
 * enough synchronization already.

The comment's justification is incomplete, though.  What of pages filled over
the course of multiple checkpoint cycles?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error in char(n) example in documentation

2015-11-10 Thread Tom Lane
Peter Geoghegan  writes:
> I think that this example in the docs [1] is wrong:

Yeah, you're quite right.  Pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers