[HACKERS] Confusing comment in tidbitmap.c

2014-12-14 Thread David Rowley
The following comment above #define PAGES_PER_CHUNK in tibbitmap.c appears
to be incorrect:

But we
 * also want PAGES_PER_CHUNK to be a power of 2 to avoid expensive integer
 * remainder operations.  So, define it like this:

I don't quite follow this as it does nothing of the kind.
Check tbm_page_is_lossy() where we do: bitno = pageno % PAGES_PER_CHUNK;

Or am I missing something about the compiler optimizing that to: bitno =
pageno  255; ?

Regards

David Rowley


Re: [HACKERS] speedup tidbitmap patch: cache page

2014-12-14 Thread David Rowley
On 23 October 2014 at 00:52, Teodor Sigaev teo...@sigaev.ru wrote:

 In specific workload postgres could spend a lot of time in
 tbm_add_tuples,  up to 50% of query time. hash_search call is expensive and
 called twice for each ItemPointer to insert. Suggested patch tries to cache
 last PagetableEntry pointer in hope that next ItemPointer will be on the
 same relation page.

 Patch is rather trivial and I don't believe that it could cause
 performance degradation. But it could increase performance on some workload.

 An example:
 # create extension btree_gin;
 # select (v / 10)::int4 as i into t from generate_series(1, 500) as v;
 # create index idx on t using gin (i);
 # set enable_seqscan  = off;

 # explain analyze select * from t where i = 0;
 without patch: Execution time: 2427.692 ms
 with patch:Execution time: 2058.718 ms

 # explain analyze select * from t where i = 100 and i= 100;
 without patch: Execution time: 524.441 ms
 with patch:Execution time: 195.381 ms


This seems like quite a promising performance boost.

I've been having a look at this and I'm wondering about a certain scenario:

In tbm_add_tuples, if tbm_page_is_lossy() returns true for a given block,
and on the next iteration of the loop we have the same block again, have
you benchmarked any caching code to store if tbm_page_is_lossy() returned
true for that block on the previous iteration of the loop? This would save
from having to call tbm_page_is_lossy() again for the same block. Or are
you just expecting that tbm_page_is_lossy() returns true so rarely that
you'll end up caching the page most of the time, and gain on skipping both
hash lookups on the next loop, since page will be set in this case?

It would be nice to see a comment to explain why it might be a good idea to
cache the page lookup. Perhaps something like:

+   /*
+* We'll cache this page as it's quite likely that
on the next loop
+* we'll be seeing the same page again. This will
save from having
+* to lookup the page in the hashtable again.
+*/
+ page = tbm_get_pageentry(tbm, blk);


I also wondered if there might be a small slowdown in the case where the
index only finds 1 matching tuple. So I tried the following:

select v::int4 as i into t1 from generate_series(1, 500) as v;
create index t1_idx on t1 using gin (i);

select count(*) from t1 where i = 0;

In this case tbm_add_tuples is called with ntids == 1, so there's only 1
loop performed per call. I wondered if the page == NULL check would add
much overhead the function.

Times are in milliseconds:

Patch Master
2413.183 2339.979
2352.564 2428.025
2369.118 2359.498
2338.442 2350.974
2373.330 2400.942
2383.883 2342.461
2393.804 2359.49
2334.998 2359.884
2428.156 2520.842
2336.978 2356.995
avg. 2372.4456 2381.909 99.6%
med. 2371.224 2359.494 100.5%

It appears that if it does, then it's not very much.

Regards

David Rowley


Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland
On 13/12/14 09:37, Craig Ringer wrote:

 Speaking as the originator of commitfests, they were *always* intended
 to be a temporary measure, a step on the way to something else like
 continuous integration.
 
 I'd really like to see the project revisit some of the underlying
 assumptions that're being made in this discussion:
 
 - Patches must be email attachments to a mailing list
 
 - Changes must be committed by applying a diff
 
 ... and take a look at some of the options a git-based workflow might
 offer, especially in combination with some of the tools out there that
 help track working branches, run CI, etc.
 
 Having grown used to push/pull workflows with CI integration I find the
 PostgreSQL patch workflow very frustrating, especially for larger
 patches. It's particularly annoying to see a patch series squashed into
 a monster patch whenever it changes hands or gets rebased, because it's
 being handed around as a great honking diff not a git working branch.
 
 Is it time to stop using git like CVS?

While not so active these days with PostgreSQL, I believe there is still
great scope for improving the workflow of reviewing and applying
patches. And while the commitfests were a good idea when they started,
it's obvious that in their current form they do not scale to meet the
current rate of development.

If I could name just one thing that I think would improve things it
would be submission of patches to the list in git format-patch format.
Why? Because it enables two things: 1) by definition patches are
small-chunked into individually reviewable pieces and 2) subject lines
allow list members to filter things that aren't relevant to them.

Here's an example of how the workflow works for the QEMU project which
I'm involved in, which also has similar issues in terms of numbers of
developers and supported platforms:


1) Developer submits a patch to the -devel list with git-format-patch

- For subsystems with listed maintainers in the MAINTAINERS file, the
maintainer must be CCd on the patchset (this is so that even if
developers decide to filter PATCH emails to the list, they still get the
CC copy)

- Patchsets must be iterative and fully bisectable, with clear comments
supplied in the commit message

- Any patchset with  1 patch MUST have a cover letter

- All patches have a Signed-off-by indicating that the developer has
accepted the terms of the project license

- Each subject line should be in the format [PATCH] subsystem: one line
comment to enabled people to determine its relevance


2) Other developers start reviewing patches

There are several mini-workflows that tend to occur here so I'll try and
give some popular examples.

- If a patch is a build fix, one of the core committers will verify and
apply directly to the master repository

- If a maintainer is happy with the whole patchset, they reply to the
cover letter with a Reviewed-by and a line stating which subtree the
patchset has been applied to. Maintainers add a Signed-off-by to all
patches accepted via their subtree.

- A maintainer may indicate that the patch itself is fine, but the
commit message is not clear/incorrect and should be changed before being
resubmitted

- Any member of the mailing list may reply/review an individual patch by
hitting reply in their mail client. Patch comments are included
inline. If a reviewer is happy with an individual patch, they reply to
the patch and add a Reviewed-by line; anyone can provide a review,
even if they are not a maintainer

- For complex patches, a maintainer may request that the patchset may be
split into further patchsets. In general patchsets of  20-30 patches
tend to be frowned upon, and will often immediately result in a reply
saying please break this down into smaller chunks

- A maintainer may reply and say that while the patchset in its final
form needs more work, some clean-ups introduced by the patch are ready
to be applied, e.g. please resubmit patches 1-4, 7 and 8 as a separate
patchset which can then be applied directly to a subtree


3) Patchset tidy-up and submission

After the initial review, new versions of the patchset are generally
reposted to the list within a few days.

- The patch version is included in the header with the same subject
line, e.g. [PATCHv2] Add new feature

- The cover letter contains a mini-changelog giving the changes between
v1 and v2 e.g.

v2:
  Fix spelling mistake pointed out by Peter
  Change lock ordering requested by Andreas

- Reviewed-by tags for individual patches sent to the list are
appended to the commit message for that patch before resubmission


Once a maintainer has accepted a patch into their subtree, they send a
pull request to the core maintainers to apply their trees to the main
repository. I appreciate that currently merge commits aren't used in the
PostgreSQL git repository so a pull request effectively becomes rebase
this patchset onto master and push.

So why does this help patch review? From my experience I can 

Re: [HACKERS] Commitfest problems

2014-12-14 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 #2 is solved by my previous comments about giving the CFM/C the 
 authority. -Core could do that, they are in charge of release.

I don't think authority is the solution. Or certainly not one that 
would work with an open source project like ours. What *would* 
work is to identify and fix the friction points that prevent 
people from joining, make the work harder than it needs to be, 
and makes people stop reviewing? I could quickly identify a handful 
of things, primarily among them the awful link-to-the-archives 
to gather up all the patches process. We have git, let's use it 
as it was intended.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201412141011
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlSNqK8ACgkQvJuQZxSWSsiewwCffAxv8xSZEyLWFz/b2+PxXOXS
xB4An2ubr7ovELtFMKZOZCsFHQAyVca4
=S6ZQ
-END PGP SIGNATURE-




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar

 I ran a test using postgres-US.fo built in the PostgreSQL source tree,
 which is 38 MB, and ran

 select unnest(xpath('//fo:bookmark-title', b, array[array['fo',
 'http://www.w3.org/1999/XSL/Format']])) from data;

 (Table contains one row only.)

 The timings were basically indistinguishable between the three code
 versions.

 I'll try to reproduce your test.  How big is your file?  Do you have a
 link to the actual file?  Could you share your load script?


I use this xml sample:
http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml

Basically i loaded the xml to table u 100 times. Load script attached.

Regards,
-- 
Ali Akbar


load_test.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-12-14 Thread Amit Kapila
On Sun, Dec 14, 2014 at 11:54 AM, Amit Kapila amit.kapil...@gmail.com
wrote:

 On Sat, Dec 13, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  Andrew Dunstan and...@dunslane.net writes:
   On 11/20/2014 02:27 AM, Amit Kapila wrote:
   Now the part where I would like to receive feedback before revising
the
   patch is on the coding style.  It seems to me from Tom's comments
that
   he is not happy with the code, now I am not sure which part of the
patch
   he thinks needs change.  Tom if possible, could you be slightly more
   specific about your concern w.r.t code?
 
   In view of the request above for comments from Tom, I have moved this
   back to Needs Review.
 
  Sorry, I was not paying very close attention to this thread and missed
  the request for comments.  A few such:
 
  1. The patch is completely naive about what might be in the symlink
  path string; eg embedded spaces in the path would break it.  On at
  least some platforms, newlines could be in the path as well.  I'm not
  sure about how to guard against this while maintaining human readability
  of the file.

 I will look into this and see what best can be done.


One way to deal with this could be to append a delimiter(which is not
allowed
in tablespace path like quote (\')) at the end of tablespace path while
writing the same to symlink label file and then use that as end marker while
reading it from file.  I think that might defeat the human readable aspect
of
this file to an extent, but I am not sure if it is too important to keep it
human readable.  I think even if we want to provide some information to
user from internal files, it is always better to provide it via some
utility/tool.

Do we support newline in tablespace path?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Commitfest problems

2014-12-14 Thread Craig Ringer
On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:
 Compare this to say, for example, huge patches such as RLS.

I specifically objected to that being flattened into a single monster
patch when I saw that'd been done. If you look at my part in the work on
the row security patch, while I was ultimately unsuccessful in getting
the patch mergeable I spent quite a bit of time splitting it up into a
logical patch-series for sane review and development. I am quite annoyed
that it was simply flattened back into an unreviewable, hard-to-follow
blob and committed in that form.

It's not like development on a patch series is difficult. You commit
small fixes and changes, then you 'git rebase -i' and reorder them to
apply to the appropriate changesets. Or you can do a 'rebase -i' and in
'e'dit mode make amendments to individual commits. Or you can commit
'fixup!'s that get auto-squashed.

This is part of my grumbling about the use of git like it's still CVS.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-14 Thread Craig Ringer
On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:

 If I could name just one thing that I think would improve things it
 would be submission of patches to the list in git format-patch format.
 Why? Because it enables two things: 1) by definition patches are
 small-chunked into individually reviewable pieces and 2) subject lines
 allow list members to filter things that aren't relevant to them.

Personally I do just that. Even though the official docs say context
diff is preferred, I think most people now use context diff in practice,
from 'git diff'.

I'm surprised most people don't use git format-patch .

Note that we can't just git am a patch from git format-patch at the
moment, because policy is that the committer should be the Author field.
The project would have to define a transition point where we started
using the git Author field for the author and the separate Committer
field, like git-am does by default. I'm not sure that'll ever actually
happen, or that it's even desirable.

 - Patchsets must be iterative and fully bisectable, with clear comments
 supplied in the commit message

Fully support this one.

Also in the smallest reasonable size divisions.

 - If a maintainer is happy with the whole patchset, they reply to the
 cover letter with a Reviewed-by and a line stating which subtree the
 patchset has been applied to. Maintainers add a Signed-off-by to all
 patches accepted via their subtree.

Sounds a lot like the kernel workflow (though in practice it seems like
the kernel folks tend bypass all this mailing list guff and just use
direct pulls/pushes for lots of things).

 - Any member of the mailing list may reply/review an individual patch by
 hitting reply in their mail client. Patch comments are included
 inline. If a reviewer is happy with an individual patch, they reply to
 the patch and add a Reviewed-by line; anyone can provide a review,
 even if they are not a maintainer

This is fun with many mail clients that like to linewrap text bodies
and don't permit inline replies to attached text.

 6) Smaller patches are applied more often
 
 By breaking large patches down into smaller chunks, even if the main
 feature itself isn't ready to be applied to the main repository then a
 lot of the preceding patches laying down the groundwork can.

I think PostgreSQL does OK on smaller patches - at least sometimes.
There can be a great deal of bikeshedding and endless arguments,
back-and-forth about utility, in-tree users, etc, but no formal process
is ever going to change that. The process of getting an uncontroversial
patch into Pg is generally painless, if often rather slow unless a
committer sees it and commits it immediately upon it being posted.

 The benefits here are that people with large out-of-tree patches 

I'm not sure we have all that many people in that category - though I'm
a part of a team that most certainly does fit the bill.

Even the small patches for BDR have been challenging and often
contentious though.

 Since as large features get
 implemented as a series of smaller patches

A big barrier here is the we must have in-tree users thing, which
makes it hard to do iterative work on a finer grained scale.

Some C-level unit tests that let us exercise small units of
functionality might ease those complaints. Right now the closest we have
to that is contrib/test_[blah] which is only useful for public API and
even then quite limited.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom timestamp format in logs

2014-12-14 Thread Magnus Hagander
On Sat, Dec 13, 2014 at 3:50 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 Hi all,

 This week, we heard about a user willing to use a custom timestamp
 format across a set of services to improve the debugability of the
 whole set, Postgres being one of them. Unfortunately datestyle does
 not take into account the logs. Would it be worth adding a new GUC
 able to control the timestamp format in the logs?

 We could still redirect the logs with syslog and have a custom
 timestamp format there, but in the case of this particular user syslog
 was a no-go. Looking at the code, timestamp format is now hardcoded in
 setup_formatted_log_time and setup_formatted_start_time when calling
 pg_strftime @ elog.c, so it doesn't seem to be much complicated to do.

 Opinions? This thread is here for that.


A separate GUC seems kind of weird. Wouldn't it be better with something
like %(format)t or such in the log_line_prefix itself in that case? That
could also be expanded to other parameters, should we need them?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Custom timestamp format in logs

2014-12-14 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Sat, Dec 13, 2014 at 3:50 PM, Michael Paquier michael.paqu...@gmail.com
 wrote:
 This week, we heard about a user willing to use a custom timestamp
 format across a set of services to improve the debugability of the
 whole set, Postgres being one of them. Unfortunately datestyle does
 not take into account the logs. Would it be worth adding a new GUC
 able to control the timestamp format in the logs?

 A separate GUC seems kind of weird. Wouldn't it be better with something
 like %(format)t or such in the log_line_prefix itself in that case? That
 could also be expanded to other parameters, should we need them?

TBH, my answer to the rhetorical question is no.  There is nothing
weird about the timestamps %t emits now, and no reason why they should
need to be configurable, except that somebody thinks it's easier to
lobby us to complicate our software than to fix whatever they have that
can't consume standard timestamp format.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland
On 14/12/14 15:51, Craig Ringer wrote:

 On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:
 Compare this to say, for example, huge patches such as RLS.
 
 I specifically objected to that being flattened into a single monster
 patch when I saw that'd been done. If you look at my part in the work on
 the row security patch, while I was ultimately unsuccessful in getting
 the patch mergeable I spent quite a bit of time splitting it up into a
 logical patch-series for sane review and development. I am quite annoyed
 that it was simply flattened back into an unreviewable, hard-to-follow
 blob and committed in that form.
 
 It's not like development on a patch series is difficult. You commit
 small fixes and changes, then you 'git rebase -i' and reorder them to
 apply to the appropriate changesets. Or you can do a 'rebase -i' and in
 'e'dit mode make amendments to individual commits. Or you can commit
 'fixup!'s that get auto-squashed.
 
 This is part of my grumbling about the use of git like it's still CVS.

I just want to add that I'm always grateful for the time that yourself
and all committers put into PostgreSQL, and my example of RLS was really
to signify big feature X rather than to pick on a particular aspect of
that patch. I apologise if you though I was in any way criticising the
work that you, or anyone else, put into this feature.

The argument I wanted to make is that if someone starts seeing a problem
with a current build of PostgreSQL and git bisects it down to a
particular commit, then smaller, iterative commits are extremely helpful
in this regard.

When searching for a needle in a haystack, there is very big difference
between a add big feature X haystack and a change struct alignment
for Y haystack, the latter which is implicit in having smaller,
iterative patchsets.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-14 Thread Tom Lane
Craig Ringer cr...@2ndquadrant.com writes:
 On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:
 Compare this to say, for example, huge patches such as RLS.

 I specifically objected to that being flattened into a single monster
 patch when I saw that'd been done. If you look at my part in the work on
 the row security patch, while I was ultimately unsuccessful in getting
 the patch mergeable I spent quite a bit of time splitting it up into a
 logical patch-series for sane review and development. I am quite annoyed
 that it was simply flattened back into an unreviewable, hard-to-follow
 blob and committed in that form.

TBH, I'm not really on board with this line of argument.  I don't find
broken-down patches to be particularly useful for review purposes.  An
example I was just fooling with this week is the GROUPING SETS patch,
which was broken into three sections for no good reason at all.  (The
fourth and fifth subpatches, being alternative solutions to one problem,
are in a different category of course.)  Too often, decisions made in
one subpatch don't make any sense until you see the larger picture.

Also, speaking of the larger picture: the current Postgres revision
history amounts to 37578 commits (as of sometime yesterday) --- and that's
just in the HEAD branch.  If we'd made an effort to break feature patches
into bite-size chunks like you're recommending here, we'd probably have
easily half a million commits in the mainline history.  That would not be
convenient to work with, and I really doubt that it would be more useful
for git bisect purposes, and I'll bet a large amount of money that most
of them would not have had commit messages composed with any care at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confusing comment in tidbitmap.c

2014-12-14 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 The following comment above #define PAGES_PER_CHUNK in tibbitmap.c appears
 to be incorrect:

 But we
  * also want PAGES_PER_CHUNK to be a power of 2 to avoid expensive integer
  * remainder operations.  So, define it like this:

 I don't quite follow this as it does nothing of the kind.
 Check tbm_page_is_lossy() where we do: bitno = pageno % PAGES_PER_CHUNK;

 Or am I missing something about the compiler optimizing that to: bitno =
 pageno  255; ?

Exactly.  Any C compiler ever written will do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: plpgsql - Assert statement

2014-12-14 Thread Pavel Stehule
Hi

any comments to last proposal and patch?

Pavel

2014-11-26 19:52 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 2014-11-26 16:46 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com:



 2014-11-26 13:31 GMT+01:00 Marko Tiikkaja ma...@joh.to:

 On 11/26/14 8:55 AM, Pavel Stehule wrote:

 * should be assertions globally enabled/disabled? - I have no personal
 preference in this question.


 I think so.  The way I would use this function is to put expensive
 checks into strategic locations which would only run when developing
 locally (and additionally perhaps in one of the test environments.)  And in
 that case I'd like to globally disable them for the live environment.


 ok



  * can be ASSERT exception handled ? - I prefer this be unhandled
 exception
 - like query_canceled because It should not be masked by plpgsql
 EXCEPTION
 WHEN OTHERS ...


 I don't care much either way, as long as we get good information about
 what went wrong.  A stack trace and hopefully something like
 print_strict_params for parameters to the expr.


 There is more ways, I can live with both


 here is proof concept

 what do you think about it?

 Regards

 Pavel



 Pavel





 .marko






Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland
On 14/12/14 15:57, Craig Ringer wrote:

 On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:
 
 If I could name just one thing that I think would improve things it
 would be submission of patches to the list in git format-patch format.
 Why? Because it enables two things: 1) by definition patches are
 small-chunked into individually reviewable pieces and 2) subject lines
 allow list members to filter things that aren't relevant to them.
 
 Personally I do just that. Even though the official docs say context
 diff is preferred, I think most people now use context diff in practice,
 from 'git diff'.
 
 I'm surprised most people don't use git format-patch .
 
 Note that we can't just git am a patch from git format-patch at the
 moment, because policy is that the committer should be the Author field.
 The project would have to define a transition point where we started
 using the git Author field for the author and the separate Committer
 field, like git-am does by default. I'm not sure that'll ever actually
 happen, or that it's even desirable.

Sure, I appreciate that. The part I particularly wanted to emphasise
here was the use of a system identifier as part of the subject line,
e.g. say if there was a hypothetical patchset which sped up PostgreSQL
by 20% then you could see subjects like this:

[PATCH 1/x] lmgr: reduce lock strength for LWLock
[PATCH 2/x] wal: change lock ordering for WAL writes
[PATCH 3/x] optimiser: exit early if lock unobtainable

While these are obviously unrealistic, what I hope here is that people
with interest in these areas would take note of individual patches even
if they aren't interested in the entire patchset.

So maybe since Andreas did some recent locking work, he spots lmgr in
the subject and then makes a note to review that part of the patch.
Similary Heikki could spot wal and make a note to look at that, whilst
Tom would zero in on the optimiser changes.

The aim here is that no one person needs to sit and initially review a
complete patch before returning feedback to the developer.

 - Patchsets must be iterative and fully bisectable, with clear comments
 supplied in the commit message
 
 Fully support this one.
 
 Also in the smallest reasonable size divisions.

I should add here that the QEMU folk do tend to go to great lengths to
preserve bisectability; often intermediate compatibility APIs are
introduced early in the patchset and then removed at the very end when
the final feature is implemented.

 - If a maintainer is happy with the whole patchset, they reply to the
 cover letter with a Reviewed-by and a line stating which subtree the
 patchset has been applied to. Maintainers add a Signed-off-by to all
 patches accepted via their subtree.
 
 Sounds a lot like the kernel workflow (though in practice it seems like
 the kernel folks tend bypass all this mailing list guff and just use
 direct pulls/pushes for lots of things).

Yes indeed - mainly from the people on the KVM module side. Again I'm
not saying that this workflow is correct for PostgreSQL, I was trying to
use this an example to explain *why* the workflow is done in this manner
and how it helps developers such as myself.

 - Any member of the mailing list may reply/review an individual patch by
 hitting reply in their mail client. Patch comments are included
 inline. If a reviewer is happy with an individual patch, they reply to
 the patch and add a Reviewed-by line; anyone can provide a review,
 even if they are not a maintainer
 
 This is fun with many mail clients that like to linewrap text bodies
 and don't permit inline replies to attached text.

Wow really? I can't say I've seen this in practice for a long time but I
defer to your experience here.

 6) Smaller patches are applied more often

 By breaking large patches down into smaller chunks, even if the main
 feature itself isn't ready to be applied to the main repository then a
 lot of the preceding patches laying down the groundwork can.
 
 I think PostgreSQL does OK on smaller patches - at least sometimes.
 There can be a great deal of bikeshedding and endless arguments,
 back-and-forth about utility, in-tree users, etc, but no formal process
 is ever going to change that. The process of getting an uncontroversial
 patch into Pg is generally painless, if often rather slow unless a
 committer sees it and commits it immediately upon it being posted.

I agree that trivial patches do tend to get picked up reasonably
quickly. It strikes me that it may prevent patches slipping through the
list if someone were officially nominated as a trivial patch monitor,
i.e. someone for developers to ping if their patch has been ignored
but it sounds like this is not such an issue in practice.

 The benefits here are that people with large out-of-tree patches 
 
 I'm not sure we have all that many people in that category - though I'm
 a part of a team that most certainly does fit the bill.
 
 Even the small patches for BDR have been challenging and often
 contentious 

Re: [HACKERS] Commitfest problems

2014-12-14 Thread Andrew Dunstan


On 12/14/2014 12:05 PM, Tom Lane wrote:

Craig Ringer cr...@2ndquadrant.com writes:

On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:

Compare this to say, for example, huge patches such as RLS.

I specifically objected to that being flattened into a single monster
patch when I saw that'd been done. If you look at my part in the work on
the row security patch, while I was ultimately unsuccessful in getting
the patch mergeable I spent quite a bit of time splitting it up into a
logical patch-series for sane review and development. I am quite annoyed
that it was simply flattened back into an unreviewable, hard-to-follow
blob and committed in that form.

TBH, I'm not really on board with this line of argument.  I don't find
broken-down patches to be particularly useful for review purposes.  An
example I was just fooling with this week is the GROUPING SETS patch,
which was broken into three sections for no good reason at all.  (The
fourth and fifth subpatches, being alternative solutions to one problem,
are in a different category of course.)  Too often, decisions made in
one subpatch don't make any sense until you see the larger picture.

Also, speaking of the larger picture: the current Postgres revision
history amounts to 37578 commits (as of sometime yesterday) --- and that's
just in the HEAD branch.  If we'd made an effort to break feature patches
into bite-size chunks like you're recommending here, we'd probably have
easily half a million commits in the mainline history.  That would not be
convenient to work with, and I really doubt that it would be more useful
for git bisect purposes, and I'll bet a large amount of money that most
of them would not have had commit messages composed with any care at all.


I have tried to stay away from this thread, but ...

I'm also quite dubious about this suggested workflow, partly for the 
reasons Tom gives, and partly because it would constrain the way I work. 
I tend to commit with little notes to myself in the commit logs, notes 
that are never intended to become part of the public project history. I 
should be quite sad to lose that.


As for using git bisect, usually when I do this each iteration is quite 
expensive. Multiplying the number of commits by a factor between 10 and 
100, which is what I think this would involve, would just make git 
bisect have to do between 3 and 7 more iterations, ISTM. That's not a win.


On the larger issue, let me just note that I don't believe we have what 
is fundamentally a technological problem, and while technological 
changes can of course sometimes make things easier, they can also blind 
us to the more basic problems we are facing.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-12-14 Thread Alvaro Herrera
Amit Kapila wrote:

 One way to deal with this could be to append a delimiter(which is not
 allowed
 in tablespace path like quote (\')) at the end of tablespace path while
 writing the same to symlink label file and then use that as end marker while
 reading it from file.

Some GNU tools such as xargs and find use a null char as item delimiter;
see find -print0 and xargs -0.  IIRC one of our tools also allow that
(psql?).  Doing the same here would make human reading a bit more
difficult, but not completely impossible.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom timestamp format in logs

2014-12-14 Thread Alvaro Herrera
Tom Lane wrote:
 Magnus Hagander mag...@hagander.net writes:
  On Sat, Dec 13, 2014 at 3:50 PM, Michael Paquier michael.paqu...@gmail.com
  wrote:
  This week, we heard about a user willing to use a custom timestamp
  format across a set of services to improve the debugability of the
  whole set, Postgres being one of them. Unfortunately datestyle does
  not take into account the logs. Would it be worth adding a new GUC
  able to control the timestamp format in the logs?
 
  A separate GUC seems kind of weird. Wouldn't it be better with something
  like %(format)t or such in the log_line_prefix itself in that case? That
  could also be expanded to other parameters, should we need them?
 
 TBH, my answer to the rhetorical question is no.  There is nothing
 weird about the timestamps %t emits now, and no reason why they should
 need to be configurable, except that somebody thinks it's easier to
 lobby us to complicate our software than to fix whatever they have that
 can't consume standard timestamp format.

I imagine pgBadger/pgFouine wouldn't be happy with the timestamp being
infinitely configurable.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: plpgsql - Assert statement

2014-12-14 Thread Alvaro Herrera
Pavel Stehule wrote:
 Hi
 
 any comments to last proposal and patch?

WTH is that pstrdup(hint is null) thing?  Debugging leftover?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland
On 14/12/14 17:05, Tom Lane wrote:

 Craig Ringer cr...@2ndquadrant.com writes:
 On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:
 Compare this to say, for example, huge patches such as RLS.
 
 I specifically objected to that being flattened into a single monster
 patch when I saw that'd been done. If you look at my part in the work on
 the row security patch, while I was ultimately unsuccessful in getting
 the patch mergeable I spent quite a bit of time splitting it up into a
 logical patch-series for sane review and development. I am quite annoyed
 that it was simply flattened back into an unreviewable, hard-to-follow
 blob and committed in that form.
 
 TBH, I'm not really on board with this line of argument.  I don't find
 broken-down patches to be particularly useful for review purposes.  An
 example I was just fooling with this week is the GROUPING SETS patch,
 which was broken into three sections for no good reason at all.  (The
 fourth and fifth subpatches, being alternative solutions to one problem,
 are in a different category of course.)  Too often, decisions made in
 one subpatch don't make any sense until you see the larger picture.

The way in which this is normally handled is via the cover letter which
is going to be nicely captured in the mail archives; typically a cover
letter for a patchset consists of several paragraphs along the lines of
patches 1-3 do some re-org work, patch 4 reworks the Y API for new
feature X implemented in patches 9-12.

As an example take a look at the cover letter for this patch submitted
over the past few days:
http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01990.html. It
seems to me that this would give the level of detail which you are
looking for.

I agree that definitely in some cases patches could be broken into too
fine pieces, but then I think it's perfectly okay for committers to
turn around and request the series be re-submitted in more sensible
chunks if they feel things have gone too far the other way.

 Also, speaking of the larger picture: the current Postgres revision
 history amounts to 37578 commits (as of sometime yesterday) --- and that's
 just in the HEAD branch.  If we'd made an effort to break feature patches
 into bite-size chunks like you're recommending here, we'd probably have
 easily half a million commits in the mainline history.  That would not be
 convenient to work with, and I really doubt that it would be more useful
 for git bisect purposes, and I'll bet a large amount of money that most
 of them would not have had commit messages composed with any care at all.

Having worked on a few kernel patches myself, git is surprisingly
effective on huge repositories once the cache is warmed up so I don't
feel this would be an issue with a PostgreSQL git repository which will
be many magnitudes smaller.

In terms of commit messages, I don't know if you missed that part of my
original response to Craig but it's considered normal for a maintainer
to reject a patch because of an incorrect/irrelevant commit message.

So if I submitted a patch that fixed a particular bug but you as a
committer weren't happy with the commit message, then I'm perfectly okay
with you asking me to resubmit with updated/corrected patch message wording.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom timestamp format in logs

2014-12-14 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 TBH, my answer to the rhetorical question is no.  There is nothing
 weird about the timestamps %t emits now, and no reason why they should
 need to be configurable, except that somebody thinks it's easier to
 lobby us to complicate our software than to fix whatever they have that
 can't consume standard timestamp format.

 I imagine pgBadger/pgFouine wouldn't be happy with the timestamp being
 infinitely configurable.

Yeah, if the repercussions were confined to the server code that would be
one thing, but there are a number of external tools that would be affected
as well.  The complication cost has to be thought about that way rather
than evaluated in isolation.

You could argue that pgBadger et al could just document that they don't
support nonstandard timestamp formats ... but then it's really unclear why
we're shifting the complexity burden in this direction rather than asking
why the one proprietary application that wants the other thing can't cope
with the existing format choice.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-14 Thread Peter Geoghegan
On Sun, Dec 14, 2014 at 9:05 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 TBH, I'm not really on board with this line of argument.  I don't find
 broken-down patches to be particularly useful for review purposes.  An
 example I was just fooling with this week is the GROUPING SETS patch,
 which was broken into three sections for no good reason at all.  (The
 fourth and fifth subpatches, being alternative solutions to one problem,
 are in a different category of course.)  Too often, decisions made in
 one subpatch don't make any sense until you see the larger picture.

It sounds like they didn't use the technique effectively, then. I
think it will be useful to reviewers that I've broken out the
mechanism through which the ON CONFLICT UPDATE patch accepts the
EXCLUDED.* pseudo-alias into a separate commit (plus docs in a
separate commit, as well as tests) - it's a non-trivial additional
piece of code that it wouldn't be outrageous to omit in a release, and
isn't particularly anticipated by earlier cumulative commits. Maybe
people don't have a good enough sense of what sort of subdivision is
appropriate yet. I think that better style will emerge over time.

Of course, if you still prefer a monolithic commit, it's pretty easy
to produce one from a patch series. It's not easy to go in the other
direction, though.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland
On 14/12/14 17:30, Andrew Dunstan wrote:

 On 12/14/2014 12:05 PM, Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:
 Compare this to say, for example, huge patches such as RLS.
 I specifically objected to that being flattened into a single monster
 patch when I saw that'd been done. If you look at my part in the work on
 the row security patch, while I was ultimately unsuccessful in getting
 the patch mergeable I spent quite a bit of time splitting it up into a
 logical patch-series for sane review and development. I am quite annoyed
 that it was simply flattened back into an unreviewable, hard-to-follow
 blob and committed in that form.
 TBH, I'm not really on board with this line of argument.  I don't find
 broken-down patches to be particularly useful for review purposes.  An
 example I was just fooling with this week is the GROUPING SETS patch,
 which was broken into three sections for no good reason at all.  (The
 fourth and fifth subpatches, being alternative solutions to one problem,
 are in a different category of course.)  Too often, decisions made in
 one subpatch don't make any sense until you see the larger picture.

 Also, speaking of the larger picture: the current Postgres revision
 history amounts to 37578 commits (as of sometime yesterday) --- and
 that's
 just in the HEAD branch.  If we'd made an effort to break feature patches
 into bite-size chunks like you're recommending here, we'd probably have
 easily half a million commits in the mainline history.  That would not be
 convenient to work with, and I really doubt that it would be more useful
 for git bisect purposes, and I'll bet a large amount of money that most
 of them would not have had commit messages composed with any care at all.
 
 I have tried to stay away from this thread, but ...
 
 I'm also quite dubious about this suggested workflow, partly for the
 reasons Tom gives, and partly because it would constrain the way I work.
 I tend to commit with little notes to myself in the commit logs, notes
 that are never intended to become part of the public project history. I
 should be quite sad to lose that.

Interestingly enough, I tend to work in a very similar way to this. When
submitting patches upstream, I tend to rebase on a new branch and then
squash/rework as required.

One thing I do tend to find is that once I start rebasing the patches
for upstream, I find that many of my personal notes can be rewritten as
part of the patch comment (I would like to think that comments useful to
myself will be useful to other developers some day). Once the rebase is
complete, often I find that I have no non-public comments left so that
problem becomes almost non-existent.

 As for using git bisect, usually when I do this each iteration is quite
 expensive. Multiplying the number of commits by a factor between 10 and
 100, which is what I think this would involve, would just make git
 bisect have to do between 3 and 7 more iterations, ISTM. That's not a win.

I find that this isn't too bad in practice. If you think about
monolithic patches during a commitfest, you can imagine that most of
them will touch one of the core .h files which will require most things
to be rebuilt once applied during bisection.

With smaller iterative patchsets, each patch tends to only touch a
handful of files and so make install generally only needs to rebuild
and link a very small number of files which is reasonably quick. Since
all of the changes for global header files are contained in 1-2 patches
then the number of complete rebuilds isn't as many as you might expect.

 On the larger issue, let me just note that I don't believe we have what
 is fundamentally a technological problem, and while technological
 changes can of course sometimes make things easier, they can also blind
 us to the more basic problems we are facing.

Absolutely. I firmly believe that the right tools for the job are
available, it's really trying to find a workflow solution that works
well for everyone involved in the project.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: plpgsql - Assert statement

2014-12-14 Thread Pavel Stehule
2014-12-14 18:57 GMT+01:00 Alvaro Herrera alvhe...@2ndquadrant.com:

 Pavel Stehule wrote:
  Hi
 
  any comments to last proposal and patch?

 WTH is that pstrdup(hint is null) thing?  Debugging leftover?


No, only not well error message. I propose a syntax ASSERT
boolean_expression [, text expression ] -- text expression is hint for
assertion debugging

This text expression should not be null when it is used. I am not sure,
what is well behave - so when assertions fails and text expression is null,
then I use message hint is null as hint.

Regards

Pavel



 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services



Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland
On 14/12/14 18:24, Peter Geoghegan wrote:

 On Sun, Dec 14, 2014 at 9:05 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 TBH, I'm not really on board with this line of argument.  I don't find
 broken-down patches to be particularly useful for review purposes.  An
 example I was just fooling with this week is the GROUPING SETS patch,
 which was broken into three sections for no good reason at all.  (The
 fourth and fifth subpatches, being alternative solutions to one problem,
 are in a different category of course.)  Too often, decisions made in
 one subpatch don't make any sense until you see the larger picture.
 
 It sounds like they didn't use the technique effectively, then. I
 think it will be useful to reviewers that I've broken out the
 mechanism through which the ON CONFLICT UPDATE patch accepts the
 EXCLUDED.* pseudo-alias into a separate commit (plus docs in a
 separate commit, as well as tests) - it's a non-trivial additional
 piece of code that it wouldn't be outrageous to omit in a release, and
 isn't particularly anticipated by earlier cumulative commits. Maybe
 people don't have a good enough sense of what sort of subdivision is
 appropriate yet. I think that better style will emerge over time.

For me this is a great example of how to get more developers involved in
patch review. Imagine that I'm an experienced developer with little
previous exposure to PostgreSQL, but with a really strong flex/bison
background.

If someone posts a patch to the list as a single grouping sets patch,
then I'm going to look at that and think I have no way of understanding
this and do nothing with it.

However if it were posted as part of patchset with a subject of [PATCH]
gram.y: add EXCLUDED pseudo-alias to bison grammar then this may pique
my interest enough to review the changes to the grammar rules, with the
barrier for entry reduced to understanding just the PostgreSQL-specific
parts.

What should happen over time is that developers like this would earn the
trust of the committers, so committers can spend more time reviewing the
remaining parts of the patchset. And of course the project has now
engaged a new developer into the community who otherwise may not have
not participated.

 Of course, if you still prefer a monolithic commit, it's pretty easy
 to produce one from a patch series. It's not easy to go in the other
 direction, though.

Agreed.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2014-12-14 Thread Emre Hasegeli
 I thought we can do better than minmax for the inet data type,
 and ended up with a generalized opclass supporting both inet and range
 types.  Patch based on minmax-v20 attached.  It works well except
 a few small problems.  I will improve the patch and add into
 a commitfest after BRIN framework is committed.

I wanted to send a new version before the commitfest to get some
feedback, but it is still work in progress.  Patch attached rebased
to the current HEAD.  This version supports more operators and
box from geometric data types.  Opclasses are renamed to inclusion_ops
to be more generic.  The problems I mentioned remain beause I
couldn't solve them without touching the BRIN framework.

 To support more operators I needed to change amstrategies and
 amsupport on the catalog.  It would be nice if amsupport can be set
 to 0 like am strategies.

I think it would be nicer to get the functions from the operators
with using the strategy numbers instead of adding them directly as
support functions.  I looked around a bit but couldn't find
a sensible way to support it.  Is it possible without adding them
to the RelationData struct?

 Inet data types accept IP version 4 and version 6.  It isn't possible
 to represent union of addresses from different versions with a valid
 inet type.  So, I made the union function return NULL in this case.
 Then, I tried to store if returned value is NULL or not, in
 column-values[] as boolean, but it failed on the pfree() inside
 brin_dtuple_initilize().  It doesn't seem right to free the values
 based on attr-attbyval.

This problem remains.  There is also a similar problem with the
range types, namely empty ranges.  There should be special cases
for them on some of the strategies.  I tried to solve the problems
in several different ways, but got a segfault one line or another.
This makes me think that BRIN framework doesn't support to store
different types than the indexed column in the values array.
For example, brin_deform_tuple() iterates over the values array and
copies them using the length of the attr on the index, not the length
of the type defined by OpcInfo function.  If storing another types
aren't supported, why is it required to return oid's on the OpcInfo
function.  I am confused.

I didn't try to support other geometric types than box as I couldn't
managed to store a different type on the values array, but it would
be nice to get some feedback about the overall design.  I was
thinking to add a STORAGE parameter to the index to support other
geometric types.  I am not sure that adding the STORAGE parameter
to be used by the opclass implementation is the right way.  It
wouldn't be the actual thing that is stored by the index, it will be
an element in the values array.  Maybe, data type specific opclasses
is the way to go, not a generic one as I am trying.


brin-inclusion-v02.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-14 Thread Craig Ringer
On 12/15/2014 02:46 AM, Mark Cave-Ayland wrote:
 Interestingly enough, I tend to work in a very similar way to this. When
 submitting patches upstream, I tend to rebase on a new branch and then
 squash/rework as required.

Same here, and I find it works really well ... when I do it properly.

I usually have a private development branch that's full of fixup! 
commits, WIPs, awful commit messages, etc.

Then I have the tree that's been rebased, squashed, and tided up.

I periodically tidy my latest work and replace the old clean tree with
it, then start my new development branch on top of the new clean tree.

This gives me a somewhat nice looking, well ordered patch series to work
on top of, while retaining the flexibility to do lots of quick fixes etc.

Admittedly, sometimes the development tree gets a bit large and it takes
a while before I give it a proper cleanup. My current project being very
much in that category. Still - it works well in general.

 I find that this isn't too bad in practice. If you think about
 monolithic patches during a commitfest, you can imagine that most of
 them will touch one of the core .h files which will require most things
 to be rebuilt once applied during bisection.

It's not build time, it's test-run time. Especially if you can't
automate the test, or it isn't practical to.

That's a legitimate concern - but I don't think we'd see a blowout of
patch counts to quite the same degree.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland
On 14/12/14 20:07, Craig Ringer wrote:

 On 12/15/2014 02:46 AM, Mark Cave-Ayland wrote:
 Interestingly enough, I tend to work in a very similar way to this. When
 submitting patches upstream, I tend to rebase on a new branch and then
 squash/rework as required.
 
 Same here, and I find it works really well ... when I do it properly.
 
 I usually have a private development branch that's full of fixup! 
 commits, WIPs, awful commit messages, etc.
 
 Then I have the tree that's been rebased, squashed, and tided up.
 
 I periodically tidy my latest work and replace the old clean tree with
 it, then start my new development branch on top of the new clean tree.
 
 This gives me a somewhat nice looking, well ordered patch series to work
 on top of, while retaining the flexibility to do lots of quick fixes etc.
 
 Admittedly, sometimes the development tree gets a bit large and it takes
 a while before I give it a proper cleanup. My current project being very
 much in that category. Still - it works well in general.

That describes my workflow fairly well too.

 I find that this isn't too bad in practice. If you think about
 monolithic patches during a commitfest, you can imagine that most of
 them will touch one of the core .h files which will require most things
 to be rebuilt once applied during bisection.
 
 It's not build time, it's test-run time. Especially if you can't
 automate the test, or it isn't practical to.
 
 That's a legitimate concern - but I don't think we'd see a blowout of
 patch counts to quite the same degree.

At the end of the day, each project finds its own level as to how
complex individual patches should be and what should comprise a
patchset. Over the past couple of years the QEMU process has evolved
into its current form as maintainers and developers have figured out
what works best for them, and I don't see that PostgreSQL would be any
different in this respect - it takes time to adapt to a new workflow and
get it right.

Having worked on various parts for PostGIS for around 10 years, I've had
my head stuck into various parts of the GiST code and have got to know a
few parts of it really well.

What I find frustrating is that I've come back from a workflow where
I've been reviewing/testing patches within months of joining a project
because the barrier for entry has been so low, and yet even with much
longer experience of the PostgreSQL codebase I feel I can't do the same
for patches submitted to the commitfest.

And why is this? It's because while I know very specific areas of the
code well, many patches span different areas of the source tree of which
I have little and/or no experience, which when supplied as a single
monolithic patch makes it impossible for me to give meaningful review to
all but a tiny proportion of them.

I believe that this is one of the main reasons why people struggle to
find patch reviewers, with the consequence being that the majority of
work falls back onto the CF manager and the committers which is why we
have the current situation.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom timestamp format in logs

2014-12-14 Thread Michael Paquier
On Mon, Dec 15, 2014 at 1:36 AM, Magnus Hagander mag...@hagander.net wrote:
 On Sat, Dec 13, 2014 at 3:50 PM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 Hi all,

 This week, we heard about a user willing to use a custom timestamp
 format across a set of services to improve the debugability of the
 whole set, Postgres being one of them. Unfortunately datestyle does
 not take into account the logs. Would it be worth adding a new GUC
 able to control the timestamp format in the logs?

 We could still redirect the logs with syslog and have a custom
 timestamp format there, but in the case of this particular user syslog
 was a no-go. Looking at the code, timestamp format is now hardcoded in
 setup_formatted_log_time and setup_formatted_start_time when calling
 pg_strftime @ elog.c, so it doesn't seem to be much complicated to do.

 Opinions? This thread is here for that.


 A separate GUC seems kind of weird.
Check.

 Wouldn't it be better with something like %(format)t or such in the 
 log_line_prefix itself in that case?
 That could also be expanded to other parameters, should we need them?
Possible. I am not sure if we will be able to have a new parameter in
log_line_prefix as modulable as timestamps for formatting though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar
2014-12-14 22:18 GMT+07:00 Ali Akbar the.ap...@gmail.com:


 I ran a test using postgres-US.fo built in the PostgreSQL source tree,
 which is 38 MB, and ran

 select unnest(xpath('//fo:bookmark-title', b, array[array['fo',
 'http://www.w3.org/1999/XSL/Format']])) from data;

 (Table contains one row only.)

 The timings were basically indistinguishable between the three code
 versions.

 I'll try to reproduce your test.  How big is your file?  Do you have a
 link to the actual file?  Could you share your load script?


 I use this xml sample:
 http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml

 Basically i loaded the xml to table u 100 times. Load script attached.


Peter, while reviewing the better performing patch myself, now i think the
patch needs more work to be committed. The structuring of the method will
be confusing in the long term. I think i'll restructure the patch in the
next commitfest.

So i propose to break the patch:
1. We apply the current patch which uses xmlNodeCopy, so that the
long-standing bug will be fixed in postgres.
2. I'll work with the performance enhancement in the next commitfest.

Maybe for (2), the current better-performing patch can be viewed as PoC of
the expected performance.

Regards,
-- 
Ali Akbar


Re: [HACKERS] Status of CF 2014-10 and upcoming 2014-12

2014-12-14 Thread Michael Paquier
On Sun, Dec 14, 2014 at 12:29 AM, Andrew Dunstan and...@dunslane.net wrote:
 Opinions?
 I will look at clearing out a few of the Ready For Committer patches today
 and tomorrow.
Thank you.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-12-14 Thread Michael Paquier
On Tue, Dec 9, 2014 at 2:52 AM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Dec 8, 2014 at 9:43 AM, Peter Geoghegan p...@heroku.com wrote:
 I think it's very possible that the wrong alias may be provided by the
 user, and that we should consider that when providing a hint.

 Note that the existing mechanism (the mechanism that I'm trying to
 improve) only ever shows this error message:

 There is a column named \%s\ in table \%s\, but it cannot be
 referenced from this part of the query.

 I think it's pretty clear that this general class of user error is common.
Moving this patch to CF 2014-12 as work is still going on, note that
it is currently marked with Robert as reviewer and that its current
status is Needs review.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-14 Thread Michael Paquier
On Thu, Dec 11, 2014 at 3:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't really have any comments on the algorithms yet, having spent too
 much time trying to figure out underdocumented data structures to get to
 the algorithms.  However, noting the addition of list_intersection_int()
 made me wonder whether you'd not be better off reducing the integer lists
 to bitmapsets a lot sooner, perhaps even at parse analysis.
 list_intersection_int() is going to be O(N^2) by nature.  Maybe N can't
 get large enough to matter in this context, but I do see places that
 seem to be concerned about performance.

 I've not spent any real effort looking at gsp2.patch yet, but it seems
 even worse off comment-wise: if there's any explanation in there at all
 of what a chained aggregate is, I didn't find it.  I'd also counsel you
 to find some other way to do it than putting bool chain_head fields in
 Aggref nodes; that looks like a mess, eg, it will break equal() tests
 for expression nodes that probably should still be seen as equal.

 I took a quick look at gsp-u.patch.  It seems like that approach should
 work, with of course the caveat that using CUBE/ROLLUP as function names
 in a GROUP BY list would be problematic.  I'm not convinced by the
 commentary in ruleutils.c suggesting that extra parentheses would help
 disambiguate: aren't extra parentheses still going to contain grouping
 specs according to the standard?  Forcibly schema-qualifying such function
 names seems like a less fragile answer on that end.
Based on those comments, I am marking this patch as Returned with
Feedback on the CF app for 2014-10. Andrew, feel free to move this
entry to CF 2014-12 if you are planning to continue working on it so
as it would get additional review. (Note that this patch status was
Waiting on Author when writing this text).
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inherit support for foreign tables

2014-12-14 Thread Michael Paquier
On Thu, Dec 11, 2014 at 2:54 PM, Ashutosh Bapat
ashutosh.ba...@enterprisedb.com wrote:
 On Thu, Dec 11, 2014 at 8:39 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
 wrote:

 Hi Ashutosh,

 Thanks for the review!

 (2014/12/10 14:47), Ashutosh Bapat wrote:

 We haven't heard anything from Horiguchi-san and Hanada-san for almost a
 week. So, I am fine marking it as ready for committer. What do you say?
Moving this patch to CF 2014-12 with the same status. Let's get a
committer having a look at it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST kNN search queue (Re: KNN-GiST with recheck)

2014-12-14 Thread Michael Paquier
On Thu, Dec 11, 2014 at 12:50 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/28/2014 04:12 PM, Alexander Korotkov wrote:

 3. A binary heap would be a better data structure to buffer the
  rechecked
 values. A Red-Black tree allows random insertions and deletions, but in
 this case you need to insert arbitrary values but only remove the
  minimum
 item. That's exactly what a binary heap excels at. We have a nice binary
 heap implementation in the backend that you can use, see
 src/backend/lib/binaryheap.c.
 

 Hmm. For me binary heap would be a better data structure for KNN-GiST at
 all :-)


 I decided to give this a shot, replacing the red-black tree in GiST with the
 binary heap we have in lib/binaryheap.c. It made the GiST code somewhat
 simpler, as the binaryheap interface is simpler than the red-black tree one.
 Unfortunately, performance was somewhat worse. That was quite surprising, as
 insertions and deletions are both O(log N) in both data structures, but the
 red-black tree implementation is more complicated.

 I implemented another data structure called a Pairing Heap. It's also a
 fairly simple data structure, but insertions are O(1) instead of O(log N).
 It also performs fairly well in practice.

 With that, I got a small but measurable improvement. To test, I created a
 table like this:

 create table gisttest (id integer, p point);
 insert into gisttest select id, point(random(), random()) from
 generate_series(1, 100) id;
 create index i_gisttest on gisttest using gist (p);

 And I ran this query with pgbench:

 select id from gisttest order by p - '(0,0)' limit 1000;

 With unpatched master, I got about 650 TPS, and with the patch 720 TPS.
 That's a nice little improvement, but perhaps more importantly, the pairing
 heap implementation consumes less memory. To measure that, I put a
 MemoryContextStats(so-queueCtx) call into gistendscan. With the above
 query, but without the limit clause, on master I got:

 GiST scan context: 2109752 total in 10 blocks; 2088456 free (24998 chunks);
 21296 used

 And with the patch:

 GiST scan context: 1061160 total in 9 blocks; 1040088 free (12502 chunks);
 21072 used

 That's 2MB vs 1MB. While that's not much in absolute terms, it'd be nice to
 reduce that memory consumption, as there is no hard upper bound on how much
 might be needed. If the GiST tree is really disorganized for some reason, a
 query might need a lot more.


 So all in all, I quite like this patch, even though it doesn't do anything
 too phenomenal. It adds a some code, in the form of the new pairing heap
 implementation, but it makes the GiST code a little bit simpler. And it
 gives a small performance gain, and reduces memory usage a bit.
Hum. It looks that this patch using binary heap is intended to be a
replacement red-black tree method. Any reason why it isn't added to
the CF to track it?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] KNN-GiST with recheck

2014-12-14 Thread Michael Paquier
On Tue, Jan 28, 2014 at 10:54 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 1. This patch introduces a new polygon - point operator. That seems
 useful on its own, with or without this patch.
This patch is tracked with this entry in the commit fest app and is
marked as Ready for committer. Hence I am moving this specific part
to 2014-12 to keep track of it:

 3. A binary heap would be a better data structure to buffer the rechecked
 values. A Red-Black tree allows random insertions and deletions, but in this
 case you need to insert arbitrary values but only remove the minimum item.
 That's exactly what a binary heap excels at. We have a nice binary heap
 implementation in the backend that you can use, see
 src/backend/lib/binaryheap.c.
Based on those comments, I am marking this entry as returned with feedback:
https://commitfest.postgresql.org/action/patch_view?id=1367
Heikki has sent as well a new patch to use a binary heap method
instead of the red-black tree here:
http://www.postgresql.org/message-id/54886bb8.9040...@vmware.com
IMO this last patch should be added in the CF app, that's not the case now.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] moving from contrib to bin

2014-12-14 Thread Peter Eisentraut
Here is a patch series to move

pg_archivecleanup
pg_standby
pg_test_fsync
pg_test_timing
pg_upgrade
pg_xlogdump
pgbench

from contrib/ to src/bin/.

Move people didn't like moving oid2name and vacuumlo, which I
understand.  So I'll leave those for now.

I have also included a patch to move pg_upgrade_support into the
backend, as discussed.

Open issues/discussion points:

- no Windows build system support yet

- I didn't move the source of the man pages to .../sgml/ref.  This could
be done.

- I didn't rename to the SGML ids of the man pages to git the patter of
the other applications, because that would change the name of the HTML
pages.

- We have traditionally kept an individual author acknowledgement in the
man pages for contrib items.  Should those be removed now?

From 0a4b9a4eaf930b8e14b4b3e2077ca7706a65ab6a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut peter_e@gmx.net
Date: Sun, 14 Dec 2014 20:41:58 -0500
Subject: [PATCH 1/9] Sort SUBDIRS variable in src/bin/Makefile

The previous order appears to have been historically grown randomness.
---
 src/bin/Makefile | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/bin/Makefile b/src/bin/Makefile
index f03cc42..f0589f3 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -13,8 +13,16 @@ subdir = src/bin
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = initdb pg_ctl pg_dump \
-	psql scripts pg_config pg_controldata pg_resetxlog pg_basebackup
+SUBDIRS = \
+	initdb \
+	pg_basebackup \
+	pg_config \
+	pg_controldata \
+	pg_ctl \
+	pg_dump \
+	pg_resetxlog \
+	psql \
+	scripts
 
 ifeq ($(PORTNAME), win32)
 SUBDIRS += pgevent
-- 
2.2.0

From 403955c76136960d7d444e946ba2c20110d7e263 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut peter_e@gmx.net
Date: Sun, 14 Dec 2014 20:41:58 -0500
Subject: [PATCH 2/9] Move pg_archivecleanup from contrib/ to src/bin/

---
 contrib/Makefile   |  1 -
 contrib/pg_archivecleanup/Makefile | 18 --
 doc/src/sgml/contrib.sgml  |  1 -
 doc/src/sgml/reference.sgml|  1 +
 src/bin/Makefile   |  1 +
 {contrib = src/bin}/pg_archivecleanup/.gitignore  |  0
 src/bin/pg_archivecleanup/Makefile | 14 ++
 .../bin}/pg_archivecleanup/pg_archivecleanup.c |  2 +-
 8 files changed, 17 insertions(+), 21 deletions(-)
 delete mode 100644 contrib/pg_archivecleanup/Makefile
 rename {contrib = src/bin}/pg_archivecleanup/.gitignore (100%)
 create mode 100644 src/bin/pg_archivecleanup/Makefile
 rename {contrib = src/bin}/pg_archivecleanup/pg_archivecleanup.c (99%)

diff --git a/contrib/Makefile b/contrib/Makefile
index 195d447..c56050e 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -28,7 +28,6 @@ SUBDIRS = \
 		oid2name	\
 		pageinspect	\
 		passwordcheck	\
-		pg_archivecleanup \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
diff --git a/contrib/pg_archivecleanup/Makefile b/contrib/pg_archivecleanup/Makefile
deleted file mode 100644
index ab52390..000
--- a/contrib/pg_archivecleanup/Makefile
+++ /dev/null
@@ -1,18 +0,0 @@
-# contrib/pg_archivecleanup/Makefile
-
-PGFILEDESC = pg_archivecleanup - cleans archive when used with streaming replication
-PGAPPICON = win32
-
-PROGRAM = pg_archivecleanup
-OBJS	= pg_archivecleanup.o $(WIN32RES)
-
-ifdef USE_PGXS
-PG_CONFIG = pg_config
-PGXS := $(shell $(PG_CONFIG) --pgxs)
-include $(PGXS)
-else
-subdir = contrib/pg_archivecleanup
-top_builddir = ../..
-include $(top_builddir)/src/Makefile.global
-include $(top_srcdir)/contrib/contrib-global.mk
-endif
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index a698d0f..f21fa14 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -202,7 +202,6 @@ titleServer Applications/title
part of the core productnamePostgreSQL/productname distribution.
   /para
 
- pgarchivecleanup;
  pgstandby;
  pgtestfsync;
  pgtesttiming;
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 10c9a6d..62267db 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -257,6 +257,7 @@ titlePostgreSQL Server Applications/title
   /partintro
 
initdb;
+   pgarchivecleanup;
pgControldata;
pgCtl;
pgResetxlog;
diff --git a/src/bin/Makefile b/src/bin/Makefile
index f0589f3..0e7b183 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = \
 	initdb \
+	pg_archivecleanup \
 	pg_basebackup \
 	pg_config \
 	pg_controldata \
diff --git a/contrib/pg_archivecleanup/.gitignore b/src/bin/pg_archivecleanup/.gitignore
similarity index 100%
rename from contrib/pg_archivecleanup/.gitignore
rename to src/bin/pg_archivecleanup/.gitignore
diff --git a/src/bin/pg_archivecleanup/Makefile b/src/bin/pg_archivecleanup/Makefile
new file mode 100644
index 

Re: [HACKERS] pgcrypto: PGP signatures

2014-12-14 Thread Michael Paquier
On Wed, Nov 12, 2014 at 7:05 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Sat, Nov 1, 2014 at 7:52 AM, Marko Tiikkaja ma...@joh.to wrote:

 Hi,

 I discovered a problem with the lack of MDC handling in the signature info
 extraction code, so I've fixed that and added a test message.  v9 here.




 Hi Marko,

 I get a segfault when the length of the message is exactly 16308 bytes, see
 attached perl script.

 I can't get a backtrace, for some reason it acts as if there were no debug
 symbols despite that I built with them.  I've not seen that before.

 I get this whether or not the bug 11905 patch is applied, so the problem
 seems to be related but different.
This patch status was Ready for committer but it still has visibly
some bugs, and has not been updated in a while as pointed out by Jeff.
So I am switching it as Returned with feedback.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] moving from contrib to bin

2014-12-14 Thread Michael Paquier
On Mon, Dec 15, 2014 at 10:59 AM, Peter Eisentraut pete...@gmx.net wrote:
 - no Windows build system support yet
Do you need some help here? I am getting worried about potential
breakage with the version information built with MSVC and MinGW..

 - We have traditionally kept an individual author acknowledgement in the
 man pages for contrib items.  Should those be removed now?
+1.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-14 Thread Michael Paquier
On Fri, Dec 12, 2014 at 6:39 AM, Robert Haas robertmh...@gmail.com wrote:
 It's probably something we should add, but there's enough to do
 getting the basic feature working first.
Moving this patch to CF 2014-12 as work is still going on.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-14 Thread Amit Langote

Alvaro wrote:
 Claudio Freire wrote:
 
  Fair enough, but that's not the same as not requiring easy proofs. The
  planner might not the one doing the proofs, but you still need proofs.
 
  Even if the proving method is hardcoded into the partitioning method,
  as in the case of list or range partitioning, it's still a proof. With
  arbitrary functions (which is what prompted me to mention proofs) you
  can't do that. A function works very well for inserting, but not for
  selecting.
 
  I could be wrong though. Maybe there's a way to turn SQL functions
  into analyzable things? But it would still be very easy to shoot
  yourself in the foot by writing one that is too complex.
 
 Arbitrary SQL expressions (including functions) are not the thing to use
 for partitioning -- at least that's how I understand this whole
 discussion.  I don't think you want to do proofs as such -- they are
 expensive.
 

This means if a user puts arbitrary expressions in a partition definition, say,

... FOR VALUES  extract(month from current_date) TO extract(month from 
current_date + interval '3 months'),

we make sure that those expressions are pre-computed to literal values. The 
exact time when that happens is open for discussion I guess. It could be either 
DDL time or, if feasible, during relation cache building when we compute the 
value from pg_node_tree of this expression which we may choose to store in the 
partition definition catalog. The former entails an obvious challenge of 
figuring out how we store the computed value into catalog (pg_node_tree of a 
Const?).

 To make this discussion a bit clearer, there are two things to
 distinguish: one is routing tuples, when an INSERT or COPY command
 references the partitioned table, into the individual partitions
 (ingress); the other is deciding which partitions to read when a SELECT
 query wants to read tuples from the partitioned table (egress).
 
 On ingress, what you want is something like being able to do something
 on the tuple that tells you which partition it belongs into.  Ideally
 this is something much lighter than running an expression; if you can
 just apply an operator to the partitioning column values, that should be
 plenty fast.  This requires no proof.
 

And I am thinking this's all executor stuff.

 On egress you need some direct way to compare the scan quals with the
 partitioning values.  I would imagine this to be similar to how scan
 quals are compared to the values stored in a BRIN index: each scan qual
 has a corresponding operator strategy and a scan key, and you can say
 aye or nay based on a small set of operations that can be run
 cheaply, again without any proof or running arbitrary expressions.
 

My knowledge of this is far from being perfect, though to clear any confusions -

As far as planning is concerned, I could not imagine how index access method 
way of pruning partitions could be made to work. Of course, I may be missing 
something. 

When you say scan qual has a corresponding operator strategy, I'd think that 
is a part of scan key in executor, no?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] add ssl_protocols configuration option

2014-12-14 Thread Michael Paquier
On Thu, Nov 27, 2014 at 8:51 PM, Alex Shulgin a...@commandprompt.com wrote:

 Dag-Erling Smørgrav d...@des.no writes:

 Alex Shulgin a...@commandprompt.com writes:
 OK, looks like I've come up with something workable: I've added
 sslprotocol connection string keyword similar to pre-existing
 sslcompression, etc.  Please see attached v2 of the original patch.
 I'm having doubts about the name of openssl.h header though,
 libpq-openssl.h?

 Perhaps ssloptions.[ch], unless you plan to add non-option-related code
 there later?

 I don't think anything else than common options-related code fits in
 there, so ssloptions.c makes sense to me.

 BTW, there is no Regent code in your openssl.c, so the copyright
 statement is incorrect.

 Good catch, I just blindly copied that from some other file.
There have been arguments in favor and against this patch... But
marking it as returned with feedback because of a lack of activity in
the last couple of weeks. Feel free to update if you think that's not
correct, and please move this patch to commit fest 2014-12.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-12-14 Thread Michael Paquier
There hasn't been much activity on this patch these days, and Andres
has provided some feedback, hence switching to Returned with feedback.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Michael Paquier
On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote:
 Peter, while reviewing the better performing patch myself, now i think the
 patch needs more work to be committed. The structuring of the method will be
 confusing in the long term. I think I'll restructure the patch in the next
 commitfest.
 So i propose to break the patch:
 1. We apply the current patch which uses xmlNodeCopy, so that the
 long-standing bug will be fixed in postgres.
 2. I'll work with the performance enhancement in the next commitfest.

 Maybe for (2), the current better-performing patch can be viewed as PoC of
 the expected performance.

Ali, are you currently working on that? Would you mind re-creating new
entries in the commit fest app for the new set of patches that you are
planning to do?
For now I am switching this patch as returned with feedback.
Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add generate_series(numeric, numeric)

2014-12-14 Thread Andrew Gierth
 Fujii == Fujii Masao masao.fu...@gmail.com writes:

 Fujii Pushed.

Bug found:

regression=# select count(*) from generate_series(1::numeric,10) v, 
generate_series(1,v) w;
 count 
---
 0
(1 row)

regression=# select count(*) from generate_series(1::numeric,10) v, 
generate_series(1,v+0) w;
 count 
---
55
(1 row)

The error is in the use of PG_GETARG_NUMERIC and init_var_from_num
when setting up the multi-call state; init_var_from_num points at the
original num's digits rather than copying them, but start_num and
stop_num have just been (potentially) detoasted in the per-call
context, in which case the storage will have been freed by the next
call.

Obviously this could also be fixed by not detoasting the input until
after switching to the multi-call context, but it looks to me like
that would be unnecessarily complex.

Suggested patch attached.

(Is it also worth putting an explicit warning about this kind of thing
in the SRF docs?)

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index c73f9bc..d841b6f 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -1325,11 +1325,16 @@ generate_series_step_numeric(PG_FUNCTION_ARGS)
 
 		/*
 		 * Use fctx to keep state from call to call. Seed current with the
-		 * original start value.
+		 * original start value. We must copy the start_num and stop_num
+		 * values rather than pointing to them, since we may have detoasted
+		 * them in the per-call context.
 		 */
-		init_var_from_num(start_num, fctx-current);
-		init_var_from_num(stop_num, fctx-stop);
+		init_var(fctx-current);
+		init_var(fctx-stop);
 		init_var(fctx-step);
+
+		set_var_from_num(start_num, fctx-current);
+		set_var_from_num(stop_num, fctx-stop);
 		set_var_from_var(steploc, fctx-step);
 
 		funcctx-user_fctx = fctx;
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
index ee6cb50..9d68145 100644
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -1461,3 +1461,41 @@ select (i / (10::numeric ^ 131071))::numeric(1,0)
9
 (4 rows)
 
+-- Check usage with variables
+select * from generate_series(1::numeric, 3::numeric) i, generate_series(i,3) j;
+ i | j 
+---+---
+ 1 | 1
+ 1 | 2
+ 1 | 3
+ 2 | 2
+ 2 | 3
+ 3 | 3
+(6 rows)
+
+select * from generate_series(1::numeric, 3::numeric) i, generate_series(1,i) j;
+ i | j 
+---+---
+ 1 | 1
+ 2 | 1
+ 2 | 2
+ 3 | 1
+ 3 | 2
+ 3 | 3
+(6 rows)
+
+select * from generate_series(1::numeric, 3::numeric) i, generate_series(1,5,i) j;
+ i | j 
+---+---
+ 1 | 1
+ 1 | 2
+ 1 | 3
+ 1 | 4
+ 1 | 5
+ 2 | 1
+ 2 | 3
+ 2 | 5
+ 3 | 1
+ 3 | 4
+(10 rows)
+
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
index a7e92ac..1633e4c 100644
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -854,3 +854,7 @@ select (i / (10::numeric ^ 131071))::numeric(1,0)
 	from generate_series(6 * (10::numeric ^ 131071),
 			 9 * (10::numeric ^ 131071),
 			 10::numeric ^ 131071) as a(i);
+-- Check usage with variables
+select * from generate_series(1::numeric, 3::numeric) i, generate_series(i,3) j;
+select * from generate_series(1::numeric, 3::numeric) i, generate_series(1,i) j;
+select * from generate_series(1::numeric, 3::numeric) i, generate_series(1,5,i) j;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user/role CURRENT_USER

2014-12-14 Thread Michael Paquier
On Fri, Nov 14, 2014 at 5:39 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hi, this is revised version.

 Kyotaro HORIGUCHI wrote:

  - Storage for new information
 
  The new struct NameId stores an identifier which telling what it
  logically is using the new enum NameIdTypes.

 I think NameId is a bad name for this.  My point is that NameId, as it
 stands, might be a name for anything, not just a role; and the object it
 identifies is not an Id either.  Maybe RoleSpec?

 Yeah! I felt it no good even if it were a generic type for
 various Name of something or its oid. RoleSpec sounds much better.

 Do we need a public_ok
 argument to get_nameid_oid() (get a better name for this function too)

 Maybe get_rolespec_oid() as a name ofter its parameter type?

 so that callers don't have to check for InvalidOid argument?  I think
 the arrangement you propose is not very convenient; it'd be best to
 avoid duplicating the check for InvalidOid in all callers of the new
 function, particularly where there was no check before.

 I agree that It'd be better keeping away from duplicated
 InvalidOid checks, but public_ok seems a bit myopic. Since
 there's no reasonable border between functions accepting 'public'
 and others, such kind of solution would not be reasonable..

 What about checking it being a PUBLIC or not *before* calling
 get_rolespec_oid()?

 The attached patch modified in the following points.

  - rename NameId to RoleSpec and NameIdType to RoleSpecTypes.
  - rename get_nameid_oid() to get_rolespec_oid().
  - rename roleNamesToIds() to roleSpecsToIds().
  - some struct members are changed such as authname to authrole.
  - check if rolespec is public or not before calling get_rolespec_oid()
  - ExecAlterDefaultPrivilegesStmt and ExecuteGrantStmt does
slightly different things about ACL_ID_PUBLIC but I unified it
to the latter.
  - rebased to the current master
A new patch has been sent here but no review occurred, hence moving
this item to CF 2014-12.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-14 Thread Andrew Gierth
 Michael == Michael Paquier michael.paqu...@gmail.com writes:

 Michael Based on those comments, I am marking this patch as
 Michael Returned with Feedback on the CF app for 2014-10. Andrew,
 Michael feel free to move this entry to CF 2014-12 if you are
 Michael planning to continue working on it so as it would get
 Michael additional review. (Note that this patch status was Waiting
 Michael on Author when writing this text).

Moved it to 2014-12 and set it back to waiting on author. We expect to
submit a revised version, though I have no timescale yet.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-12-14 Thread Tom Lane
Amit Kapila amit.kapil...@gmail.com writes:
 On Mon, Dec 15, 2014 at 5:39 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 What makes you think quote isn't allowed in tablespace paths?

 Below part of code makes me think that quote is not allowed.

 Oid
 CreateTableSpace(CreateTableSpaceStmt *stmt)
 {
 ..
 /* disallow quotes, else CREATE DATABASE would be at risk */
 if (strchr(location, '\''))
 ereport(ERROR,
 (errcode(ERRCODE_INVALID_NAME),
 errmsg(tablespace location cannot contain single quotes)));
 }

Hm, I think that's left over from defending a *very* ancient version
of CREATE DATABASE.  In any case, as I mentioned, any limitations
we might be putting on tablespace paths during SQL-level operation
are pretty much a dead letter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-14 Thread Michael Paquier
On Mon, Dec 15, 2014 at 12:28 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 Michael == Michael Paquier michael.paqu...@gmail.com writes:

  Michael Based on those comments, I am marking this patch as
  Michael Returned with Feedback on the CF app for 2014-10. Andrew,
  Michael feel free to move this entry to CF 2014-12 if you are
  Michael planning to continue working on it so as it would get
  Michael additional review. (Note that this patch status was Waiting
  Michael on Author when writing this text).

 Moved it to 2014-12 and set it back to waiting on author. We expect to
 submit a revised version, though I have no timescale yet.
OK thanks for the update.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping from blocked send() reprised.

2014-12-14 Thread Michael Paquier
On Thu, Oct 30, 2014 at 10:27 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 10/03/2014 06:29 PM, Heikki Linnakangas wrote:
 [blah]
 About patch 3:
 Looking closer, this design still looks OK to me. As you said yourself, the
 comments need some work (e.g. the step 5. in the top comment in async.c
 needs updating). And then there are a couple of FIXME and XXX comments that
 need to be addressed.
Those patches have not been updated in a while, and I am seeing some
feedback from several people, hence returning as returned with
feedback. Horiguchi-san, feel free to add new entries on the CF app in
2014-12 or move this entry if you feel overwise.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user set local_preload_libraries.

2014-12-14 Thread Michael Paquier
On Tue, Dec 9, 2014 at 11:56 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 8, 2014 at 9:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Barring someone committing to spend the time to improve that situation
 (time that would be poorly invested IMO), I don't think that we want to
 open up ignore_system_indexes as USERSET, or do anything else to encourage
 its use.

 If we're intent on removing PGC_BACKEND then I'd be okay with
 reclassifying ignore_system_indexes as SUSET; but I'm not exactly
 convinced that we should be trying to get rid of PGC_BACKEND.

 Well, if you want to discourage its use, I think there's an argument
 that marking it as SUSET would be more restrictive than what we have
 at present, since it would altogether prohibit non-superuser use.

 I'm not wedded to the idea of getting rid of PGC_BACKEND, but I do
 like it.  Peter's survey of the landscape seems to show that there's
 very little left in that category and the stuff that is there is
 somewhat uninspiring.  And simplifying things is always nice.

Documentation fixes for the use of local_preload_libraries have been
pushed, now there has been some wider discussion about changing the
mode of a couple of parameters since PGC_SU_BACKEND has been
introduced. Any problems to switch this patch to Returned with
feedback? The discussion done here is wider than the simple use of
local_preload_libraries in any case.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS with check option - surprised design

2014-12-14 Thread Michael Paquier
On Sat, Nov 22, 2014 at 5:59 AM, Stephen Frost sfr...@snowman.net wrote:
 * Peter Geoghegan (p...@heroku.com) wrote:
 On Fri, Nov 21, 2014 at 6:57 AM, Stephen Frost sfr...@snowman.net wrote:
  [blah]
 [re-blah]
 [re-re-blah]
This patch has already been committed, but there are still many
concerns shared, so moving it to CF 2014-12 as it needs more review.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Status of CF 2014-10 and upcoming 2014-12

2014-12-14 Thread Michael Paquier
On Sun, Dec 14, 2014 at 12:00 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 In any case, I will be able to do that on Monday.
 Opinions?

So I have been through all the remaining 20~25 patches of CF 2014-10,
and roughly moved them to the next CF 2014-12 or returned them with
feedback depending on their state and their stale period. If you feel
that your patch has not been treated correctly or has faced an
injustice, feel free to complain on this thread or on the thread of
the related patch. In any case, let's now move on to CF 2014-12, and
please be sure to check the status of the patch you submitted
previously if it was still on the works.

PS: Could someone close CF 2014-10 btw?)
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar
2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com:

 On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote:
  Peter, while reviewing the better performing patch myself, now i think
 the
  patch needs more work to be committed. The structuring of the method
 will be
  confusing in the long term. I think I'll restructure the patch in the
 next
  commitfest.
  So i propose to break the patch:
  1. We apply the current patch which uses xmlNodeCopy, so that the
  long-standing bug will be fixed in postgres.
  2. I'll work with the performance enhancement in the next commitfest.
 
  Maybe for (2), the current better-performing patch can be viewed as PoC
 of
  the expected performance.

 Ali, are you currently working on that? Would you mind re-creating new
 entries in the commit fest app for the new set of patches that you are
 planning to do?
 For now I am switching this patch as returned with feedback.
 Thanks,


What i mean, the last patch (v7 patch) as it is is enough to fix the bug
(nested xpath namespace problem). I think the performance regression is
still acceptable (in my case it's ~20%), because the bug is severe.
Currently, xpath can return invalid xml because the namespace isn't
included in the output!

What i'll be working is the v4 patch, because it turns out the v4 patch has
better performance (~10%, but Peter's test shows it isn't the case). But,
the problem is the v4 patch is organized wrongly, and hacks around the
libxml's xml node structure (duplicating the namespace on the existing
structure). I'll work on that, but it will affects the performance benefit.

So what i propose is, we close the longstanding bug in this comitfest with
the v7 patch. I'll work on improving the performance, without compromising
good code structure. If the result is good, i'll submit the patch.

Thanks

Regards,
-- 
Ali Akbar


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar
2014-12-15 11:02 GMT+07:00 Ali Akbar the.ap...@gmail.com:

 2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com:

 On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote:
  Peter, while reviewing the better performing patch myself, now i think
 the
  patch needs more work to be committed. The structuring of the method
 will be
  confusing in the long term. I think I'll restructure the patch in the
 next
  commitfest.
  So i propose to break the patch:
  1. We apply the current patch which uses xmlNodeCopy, so that the
  long-standing bug will be fixed in postgres.
  2. I'll work with the performance enhancement in the next commitfest.
 
  Maybe for (2), the current better-performing patch can be viewed as PoC
 of
  the expected performance.

 Ali, are you currently working on that? Would you mind re-creating new
 entries in the commit fest app for the new set of patches that you are
 planning to do?
 For now I am switching this patch as returned with feedback.
 Thanks,


 What i mean, the last patch (v7 patch) as it is is enough to fix the bug
 (nested xpath namespace problem). I think the performance regression is
 still acceptable (in my case it's ~20%), because the bug is severe.
 Currently, xpath can return invalid xml because the namespace isn't
 included in the output!


Sorry, typo here. What i mean isn't invalid xml, but incomplete xml.
Hence the next call to xpath with the previous result as its input will
become a problem because the namespace will not match.



 What i'll be working is the v4 patch, because it turns out the v4 patch
 has better performance (~10%, but Peter's test shows it isn't the case).
 But, the problem is the v4 patch is organized wrongly, and hacks around the
 libxml's xml node structure (duplicating the namespace on the existing
 structure). I'll work on that, but it will affects the performance benefit.

 So what i propose is, we close the longstanding bug in this comitfest with
 the v7 patch. I'll work on improving the performance, without compromising
 good code structure. If the result is good, i'll submit the patch.

 Thanks


Regards,
-- 
Ali Akbar


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Michael Paquier
On Mon, Dec 15, 2014 at 1:02 PM, Ali Akbar the.ap...@gmail.com wrote:
 2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com:

 On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote:
  Peter, while reviewing the better performing patch myself, now i think
  the
  patch needs more work to be committed. The structuring of the method
  will be
  confusing in the long term. I think I'll restructure the patch in the
  next
  commitfest.
  So i propose to break the patch:
  1. We apply the current patch which uses xmlNodeCopy, so that the
  long-standing bug will be fixed in postgres.
  2. I'll work with the performance enhancement in the next commitfest.
 
  Maybe for (2), the current better-performing patch can be viewed as PoC
  of
  the expected performance.

 Ali, are you currently working on that? Would you mind re-creating new
 entries in the commit fest app for the new set of patches that you are
 planning to do?
 For now I am switching this patch as returned with feedback.
 Thanks,


 What i mean, the last patch (v7 patch) as it is is enough to fix the bug
 (nested xpath namespace problem). I think the performance regression is
 still acceptable (in my case it's ~20%), because the bug is severe.
 Currently, xpath can return invalid xml because the namespace isn't included
 in the output!

 What i'll be working is the v4 patch, because it turns out the v4 patch has
 better performance (~10%, but Peter's test shows it isn't the case). But,
 the problem is the v4 patch is organized wrongly, and hacks around the
 libxml's xml node structure (duplicating the namespace on the existing
 structure). I'll work on that, but it will affects the performance benefit.

 So what i propose is, we close the longstanding bug in this comitfest with
 the v7 patch. I'll work on improving the performance, without compromising
 good code structure. If the result is good, i'll submit the patch.
OK. Could you then move this patch to the new CF with Needs Review
with Peter as reviewer then? He seems to be looking at it anyway
seeing the update from 12/11.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar
2014-12-15 11:06 GMT+07:00 Michael Paquier michael.paqu...@gmail.com:

 On Mon, Dec 15, 2014 at 1:02 PM, Ali Akbar the.ap...@gmail.com wrote:
  2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com:
 
  On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote:
   Peter, while reviewing the better performing patch myself, now i think
   the
   patch needs more work to be committed. The structuring of the method
   will be
   confusing in the long term. I think I'll restructure the patch in the
   next
   commitfest.
   So i propose to break the patch:
   1. We apply the current patch which uses xmlNodeCopy, so that the
   long-standing bug will be fixed in postgres.
   2. I'll work with the performance enhancement in the next commitfest.
  
   Maybe for (2), the current better-performing patch can be viewed as
 PoC
   of
   the expected performance.
 
  Ali, are you currently working on that? Would you mind re-creating new
  entries in the commit fest app for the new set of patches that you are
  planning to do?
  For now I am switching this patch as returned with feedback.
  Thanks,
 
 
  What i mean, the last patch (v7 patch) as it is is enough to fix the bug
  (nested xpath namespace problem). I think the performance regression is
  still acceptable (in my case it's ~20%), because the bug is severe.
  Currently, xpath can return invalid xml because the namespace isn't
 included
  in the output!
 
  What i'll be working is the v4 patch, because it turns out the v4 patch
 has
  better performance (~10%, but Peter's test shows it isn't the case). But,
  the problem is the v4 patch is organized wrongly, and hacks around the
  libxml's xml node structure (duplicating the namespace on the existing
  structure). I'll work on that, but it will affects the performance
 benefit.
 
  So what i propose is, we close the longstanding bug in this comitfest
 with
  the v7 patch. I'll work on improving the performance, without
 compromising
  good code structure. If the result is good, i'll submit the patch.
 OK. Could you then move this patch to the new CF with Needs Review

with Peter as reviewer then? He seems to be looking at it anyway
 seeing the update from 12/11.


OK. Moved to the new CF.

Regards,
-- 
Ali Akbar


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-12-14 Thread Amit Kapila
On Sat, Dec 13, 2014 at 9:41 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 11/20/2014 02:27 AM, Amit Kapila wrote:

 On Wed, Nov 19, 2014 at 11:46 PM, Robert Haas robertmh...@gmail.com
mailto:robertmh...@gmail.com wrote:
  On Tue, Nov 18, 2014 at 9:19 AM, Alvaro Herrera
  alvhe...@2ndquadrant.com mailto:alvhe...@2ndquadrant.com wrote:
   Right, but they provide same functionality as symlinks and now we
   are even planing to provide this feature for both linux and windows
as
   both Tom and Robert seems to feel, it's better that way.  Anyhow,
   I think naming any entity generally differs based on individual's
   perspective, so we can go with the name which appeals to more
people.
   In case, nobody else has any preference, I will change it to what
both
   of us can agree upon (either 'tablespace catalog',
'tablespace_info' ...).
  
   Well, I have made my argument.  Since you're the submitter, feel
free to
   select what you think is the best name.
 
  For what it's worth, I, too, dislike having symlink in the name.
  Maybe tablespace_map?

 Sounds good to me as well.

 To summarize the situation of this patch, I have received below comments
 on which I am planning to work:

 1. Change the name of file containing tablespace path information.
 2. Store tablespace name as well along with oid and path to make the
 information Human readable.
 3. Make the code generic (Remove #ifdef Win32 macro's and change
 comments referring this functionality for windows and see if any more
 changes are required to make it work on linux.)

 Now the part where I would like to receive feedback before revising the
 patch is on the coding style.  It seems to me from Tom's comments that
 he is not happy with the code, now I am not sure which part of the patch
 he thinks needs change.  Tom if possible, could you be slightly more
 specific about your concern w.r.t code?

 I have attached a rebased (on top of commit-8d7af8f) patch, just incase
 some one wants to apply and check it.



 In view of the request above for comments from Tom, I have moved this
back to Needs Review.


I am working on fixing the review comments, but I think I won't be
able to handle all as still there is discussion going on for some of
the comments, but I am intended to work on it for CF starting today.
So I have moved this patch to CF (2014-12).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Add generate_series(numeric, numeric)

2014-12-14 Thread Ali Akbar
2014-12-15 10:25 GMT+07:00 Andrew Gierth and...@tao11.riddles.org.uk:

  Fujii == Fujii Masao masao.fu...@gmail.com writes:

  Fujii Pushed.

 Bug found:

 regression=# select count(*) from generate_series(1::numeric,10) v,
 generate_series(1,v) w;
  count
 ---
  0
 (1 row)

 regression=# select count(*) from generate_series(1::numeric,10) v,
 generate_series(1,v+0) w;
  count
 ---
 55
 (1 row)

 The error is in the use of PG_GETARG_NUMERIC and init_var_from_num
 when setting up the multi-call state; init_var_from_num points at the
 original num's digits rather than copying them, but start_num and
 stop_num have just been (potentially) detoasted in the per-call
 context, in which case the storage will have been freed by the next
 call.


Oops.

Obviously this could also be fixed by not detoasting the input until
 after switching to the multi-call context, but it looks to me like
 that would be unnecessarily complex.

 Suggested patch attached.


Thanks


 (Is it also worth putting an explicit warning about this kind of thing
 in the SRF docs?)


 I think yes, it will be good. The alternative is restructuring this
paragraph in the SRF docs:

The memory context that is current when the SRF is called is a transient
 context that will be cleared between calls. This means that you do not need
 to call pfree on everything you allocated using palloc; it will go away
 anyway. However, if you want to allocate any data structures to live across
 calls, you need to put them somewhere else. The memory context referenced
 by multi_call_memory_ctx is a suitable location for any data that needs
 to survive until the SRF is finished running. In most cases, this means
 that you should switch into multi_call_memory_ctx while doing the
 first-call setup.


The important part However, if you want to allocate any data structures to
live across calls, you need to put them somewhere else. is buried in the
docs.

But i think explicit warning is more noticeable for new developer like me.

Regards,
-- 
Ali Akbar


[HACKERS] Commit fest 2014-12, let's begin!

2014-12-14 Thread Michael Paquier
Hi all,

Let's begin the commit fest of December:
https://commitfest.postgresql.org/action/commitfest_view?id=25

Looking at the stats at the top of the above page, there are currently
56 patches to be dealt with:
- Needs Review: 48
- Waiting on Author: 5
- Ready for Committer: 3
In short we are in cruel need of reviewers to make things move on.

To reviewers:
- This commit fest may be a good occasion for you to study new parts
of the code, there are many simple patches in need of love.
- If you are marked as a reviewer for a given patch, but you know that
you won't have time to look at it, it is advised that yoy remove your
name to give room to someone else.
- Be sure to keep the information of the patch you are reviewing fresh
on the CF app. This needs at most 2 minutes.
- Feel free to ping the patch submitter if what you reviewed does not
get updated. People have their daily jobs and are busy, so try to not
be too pushy. A delay of one week may give a good base, but that's
holiday season in many places.

And to patch submitters:
- Please review one patch of equivalent difficulty with your own patch
to help keeping a balance between reviews each patch
- Be sure to keep the information about your patch up-to-date
- Feel free to ping your reviewer if you have no news and need more feedback

To committers: there are 2 patches marked as ready for committer from
previous commit fest:
- Foreign table inheritance
- Point to polygon distance operator

This is going to be a period of vacations for many people here, so
let's do our best.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add generate_series(numeric, numeric)

2014-12-14 Thread Andrew Gierth
 Ali == Ali Akbar the.ap...@gmail.com writes:

 Ali  I think yes, it will be good. The alternative is restructuring
 Ali this paragraph in the SRF docs:

  The memory context that is current when the SRF is called is a
  transient context that will be cleared between calls. This means
  that you do not need to call pfree on everything you allocated
  using palloc; it will go away anyway. However, if you want to
  allocate any data structures to live across calls, you need to put
  them somewhere else. The memory context referenced by
  multi_call_memory_ctx is a suitable location for any data that
  needs to survive until the SRF is finished running. In most cases,
  this means that you should switch into multi_call_memory_ctx while
  doing the first-call setup.

 Ali The important part However, if you want to allocate any data
 Ali structures to live across calls, you need to put them somewhere
 Ali else. is buried in the docs.

 Ali But i think explicit warning is more noticeable for new
 Ali developer like me.

I was thinking something like this, added just after that para:

warning
 para
  While the actual arguments to the function remain unchanged between
  calls, if you detoast the argument values (which is normally done
  transparently by the
  functionPG_GETARG_replaceablexxx/replaceable/function macro)
  in the transient context then the detoasted copies will be freed on
  each cycle. Accordingly, if you keep references to such values in
  your structfielduser_fctx/, you must either copy them into the
  structfieldmulti_call_memory_ctx/ after detoasting, or ensure
  that you detoast the values only in that context.
 /para
/warning

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Logical Replication Helpers WIP for discussion

2014-12-14 Thread Petr Jelinek

Hello,

we've made few helper functions for making logical replication easier, I 
bundled it into contrib module as this is mainly for discussion at this 
time (I don't expect this to get committed any time soon, but it is good 
way to iron out protocol, etc).


I created sample logical decoding plugin that uses those functions and 
which can be used for passing DML changes in platform/version 
independent (hopefully) format.


I will post sample apply BG worker also once I get some initial feedback 
about this.


It's hard to write tests for this as the binary changes contain 
transaction ids and timestamps so the data changes constantly.


This is of course based on the BDR work Andres, Craig and myself have 
been doing.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/contrib/lrep/Makefile b/contrib/lrep/Makefile
new file mode 100644
index 000..d3f7ba3
--- /dev/null
+++ b/contrib/lrep/Makefile
@@ -0,0 +1,20 @@
+# contrib/lrep/Makefile
+
+MODULE_big = lrep
+OBJS = lrep_utils.o lrep_output.o $(WIN32RES)
+PG_CPPFLAGS = -I$(libpq_srcdir)
+
+#EXTENSION = lrep
+#REGRESS = lrep
+#REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/lrep/logical.conf
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/lrep
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/lrep/lrep.h b/contrib/lrep/lrep.h
new file mode 100644
index 000..e2910c4
--- /dev/null
+++ b/contrib/lrep/lrep.h
@@ -0,0 +1,108 @@
+/*-
+ *
+ * lrep.h
+ *		LREP public interfaces
+ *
+ * Copyright (c) 2012-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/lrep/lrep.h
+ *
+ *-
+ */
+#ifndef LREP_H
+#define LREP_H
+
+#include libpq-fe.h
+
+#include nodes/parsenodes.h
+
+#include replication/logical.h
+#include replication/output_plugin.h
+
+#include storage/lock.h
+
+#define LREP_PROTO_VERSION_NUM 1
+#define LREP_PROTO_MIN_REMOTE_VERSION_NUM 1
+
+typedef struct
+{
+	MemoryContext context;
+
+	bool allow_binary_protocol;
+	bool allow_sendrecv_protocol;
+	bool int_datetime_mismatch;
+	bool forward_changesets;
+
+	uint32 client_pg_version;
+	uint32 client_pg_catversion;
+	uint32 client_proto_version;
+	uint32 client_min_proto_version;
+	size_t client_sizeof_int;
+	size_t client_sizeof_long;
+	size_t client_sizeof_datum;
+	size_t client_maxalign;
+	bool client_bigendian;
+	bool client_float4_byval;
+	bool client_float8_byval;
+	bool client_int_datetime;
+	char *client_db_encoding;
+
+	void *extra;		/* Additional data */
+} LREPOutputData;
+
+
+typedef struct LREPTupleData
+{
+	Datum	values[MaxTupleAttributeNumber];
+	bool	nulls[MaxTupleAttributeNumber];
+	bool	changed[MaxTupleAttributeNumber];
+} LREPTupleData;
+
+
+extern void lrep_write_begin(StringInfo out, ReorderBufferTXN *txn, int flags,
+			 StringInfo extradata);
+extern void lrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
+			  XLogRecPtr commit_lsn, int flags,
+			  StringInfo extradata);
+
+extern void lrep_write_insert(LREPOutputData *data, StringInfo out,
+			  Relation rel, HeapTuple newtuple);
+extern void lrep_write_update(LREPOutputData *data, StringInfo out,
+			  Relation rel, HeapTuple oldtuple,
+			  HeapTuple newtuple);
+extern void lrep_write_delete(LREPOutputData *data, StringInfo out,
+			  Relation rel, HeapTuple oldtuple);
+
+extern void lrep_write_rel(StringInfo out, Relation rel);
+extern void lrep_write_tuple(LREPOutputData *data, StringInfo out, Relation rel,
+		HeapTuple tuple);
+
+extern int lrep_read_begin(StringInfo in, XLogRecPtr *origlsn,
+		   TimestampTz *committime, TransactionId *remote_xid);
+extern int lrep_read_commit(StringInfo in, XLogRecPtr *commit_lsn,
+			XLogRecPtr *end_lsn, TimestampTz *committime);
+extern Relation lrep_read_insert(StringInfo in, LOCKMODE lockmode,
+ LREPTupleData *tuple);
+extern Relation lrep_read_update(StringInfo in, LOCKMODE lockmode,
+ LREPTupleData *oldtuple,
+ LREPTupleData *newtuple, bool *pkeysent);
+extern Relation lrep_read_delete(StringInfo in, LOCKMODE lockmode,
+ LREPTupleData *tuple, bool *pkeysent);
+
+extern void lrep_read_tuple_parts(StringInfo s, TupleDesc desc,
+  LREPTupleData *tuple);
+extern RangeVar *lrep_read_rel(StringInfo s);
+
+extern bool lrep_send_feedback(PGconn *conn, XLogRecPtr recvpos,
+			   XLogRecPtr writepos, XLogRecPtr flushpos,
+			   int64 now, bool force);
+
+extern void lrep_opt_parse_notnull(DefElem *elem, const char *paramtype);
+extern void lrep_opt_parse_uint32(DefElem *elem, uint32 *res);
+extern void lrep_opt_parse_size_t(DefElem *elem, size_t *res);
+extern void 

Re: [HACKERS] Commit fest 2014-12, let's begin!

2014-12-14 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 Let's begin the commit fest of December:
 https://commitfest.postgresql.org/action/commitfest_view?id=25
 ...
 To committers: there are 2 patches marked as ready for committer from
 previous commit fest:
 - Foreign table inheritance

I should probably take this one.

 - Point to polygon distance operator

I looked at that briefly during the last fest, but was unsure whether it
was too entangled with the GiST patches that Heikki was looking at.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add transforms feature

2014-12-14 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On 4/4/14 6:21 PM, Andres Freund wrote:

 + /* dependency on transform used by return type, if any */
 + if ((trfid = get_transform_oid(returnType, languageObjectId, true)))

 Should be compared to InvalidOid imo, rather than implicitly assuming
 that InvalidOid evaluates to false.

 I think it's widely assumed that InvalidOid is false.

That's not the point; the point is that some nonzero number of compilers
(and probably Coverity too) will warn about this construct, on the not
unreasonable grounds that = might be a typo for ==.  (Those extra parens
might satisfy gcc, but not other tools.)  Please put in an explicit
comparison of the assignment result, as is done in approximately 100% of
the other places where this idiom appears in Postgres.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest 2014-12, let's begin!

2014-12-14 Thread Michael Paquier
On Mon, Dec 15, 2014 at 3:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 - Point to polygon distance operator
 I looked at that briefly during the last fest, but was unsure whether it
 was too entangled with the GiST patches that Heikki was looking at.
Recalling my memories of this morning, things are rather independent.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-14 Thread Michael Paquier
Note: this patch has been moved to CF 2014-12 and I marked myself as
an author if that's fine... I've finished by being really involved in
that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-14 Thread Claudio Freire
On Sun, Dec 14, 2014 at 11:12 PM, Amit Langote
langote_amit...@lab.ntt.co.jp wrote:
 On egress you need some direct way to compare the scan quals with the
 partitioning values.  I would imagine this to be similar to how scan
 quals are compared to the values stored in a BRIN index: each scan qual
 has a corresponding operator strategy and a scan key, and you can say
 aye or nay based on a small set of operations that can be run
 cheaply, again without any proof or running arbitrary expressions.


 My knowledge of this is far from being perfect, though to clear any 
 confusions -

 As far as planning is concerned, I could not imagine how index access method 
 way of pruning partitions could be made to work. Of course, I may be missing 
 something.

Let me be overly verbose, don't take it as patronizing, just answering
in lots of detail why this could be a good idea to try.

Normal indexes store a pointer for each key value of sorts. So B-Tree
gets you a set of tids for each key, and so does GIN and hash.

But BRIN is different in that it does the mapping differently. BRIN
stores a compact, approximate representation of the set of keys within
a page range. It can tell with some degree of (in)accuracy whether a
key or key range could be part of that page range or not. The way it
does this is abstracted out, but at its core, it stores a compressed
representation of the key set that takes a constant amount of bits to
store, and no more, no matter how many elements. What changes as the
element it represents grows, is its accuracy.

Currently, BRIN only supports min-max representations. It will store,
for each page range, the minimum and maximum of some columns, and when
you query it, you can compare range vs range, and discard whole page
ranges.

Now, what are partitions, if not page ranges?

A BRIN tuple is a min-max pair. But BRIN in more general, it could use
other data structures to hold that compressed representation, if
someone implemented them. Like bloom filters [0].

A BRIN index is a complex data structure because it has to account for
physically growing tables, but all the complexities vanish when you
fix a block range (the BR in BRIN) to a partition. Then, a mere
array of BRIN tuples would suffice.

BRIN already contains the machinery to turn quals into something that
filters out entire partitions, if you provide the BRIN tuples.

And you could even effectively matain a BRIN index for the partitions
(just a BRIN tuple per partition, dynamically updated with every
insertion).

If you do that, you start with empty partitions, and each insert
updates the BRIN tuple. Avoiding concurrency loss in this case would
be tricky, but in theory this could allow very general partition
exclusion. In fact it could even work with constraint exclusion right
now: you'd have a single-tuple BRIN index for each partition and
benefit from it.

But you don't need to pay the price of updating BRIN indexes, as
min-max tuples for each partition can be produced while creating the
partitions if the syntax already provides the information. Then, it's
just a matter of querying this meta-data which just happens to have
the form of a BRIN tuple for each partition.

[0] http://en.wikipedia.org/wiki/Bloom_filter


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Comment typo in typedef struct BrinTuple

2014-12-14 Thread Amit Langote

Hi,

Find attached that does:

-* mt_info is laid out in the following fashion:
+* bt_info is laid out in the following fashion:
snip-comment
uint8   bt_info;
} BrinTuple;

Thanks,
Amit


BrinTuple-typo-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2014-12-14 Thread Kouhei Kaigai
This attached patch adds a code example of custom-scan interface.

This custom-scan provider (ctidscan) performs almost same as
built-in SeqScan plan, but can produce same results with less
page scan in case when qualifier of relation scan has inequality
operators (, =,  or =) on ctid system column, therefore,
range to be scanned is less than case of full-table scan.

Below is an example:

postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid  '(10,0)'::tid AND b like 
'%abc%';
 QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..10.00 rows=1 width=37)
   Filter: ((ctid  '(10,0)'::tid) AND (b ~~ '%abc%'::text))
(2 rows)

Once ctidscan is loaded, it can provide cheaper cost to scan
the t1 table than SeqScan, so planner chooses the custom logic.

postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid  '(10,0)'::tid AND b like 
'%abc%';
   QUERY PLAN
-
 Custom Scan (ctidscan) on t1  (cost=0.00..5.00 rows=1 width=37)
   Filter: ((ctid  '(10,0)'::tid) AND (b ~~ '%abc%'::text))
   ctid quals: (ctid  '(10,0)'::tid)
(3 rows)

Like other query execution logic, it also provides enable_ctidscan
parameter to turn on/off this feature.

I'm not certain whether we should have this functionality in contrib
from the perspective of workload that can help, but its major worth is
for an example of custom-scan interface.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Thursday, December 11, 2014 11:58 AM
 To: Simon Riggs
 Cc: Kaigai Kouhei(海外 浩平); Thom Brown; Kohei KaiGai; Tom Lane; Alvaro
 Herrera; Shigeru Hanada; Stephen Frost; Andres Freund; PgHacker; Jim
 Mlodgenski; Peter Eisentraut
 Subject: ##freemail## Re: [HACKERS] [v9.5] Custom Plan API
 
 On Tue, Dec 9, 2014 at 3:24 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Feedback I am receiving is that the API is unusable. That could be
  because it is impenetrable, or because it is unusable. I'm not sure it
  matters which.
 
 It would be nice to here what someone is trying to use it for and what 
 problems
 that person is encountering.  Without that, it's pretty much impossible
 for anyone to fix anything.
 
 As for sample code, KaiGai had a working example, which of course got broken
 when Tom changed the API, but it didn't look to me like Tom's changes would
 have made anything impossible that was possible before.
 I'm frankly kind of astonished by the tenor of this entire conversation;
 there is certainly plenty of code in the backend that is less
 self-documenting than this is; and KaiGai did already put up a wiki page
 with documentation as you requested.  From his response, it sounds like
 he has updated the ctidscan code, too.
 
 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
 Company


pgsql-v9.5-contrib-ctidscan.v1.patch
Description: pgsql-v9.5-contrib-ctidscan.v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers