date:20151216

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Merlin Moncure

On Wed, Dec 16, 2015 at 1:29 PM, Tom Lane  wrote:
> I did some more experimentation and concluded that actually, this problem
> has nothing whatsoever to do with pager invocations.  What seems to really
> be happening is that libreadline activates its SIGWINCH handler only while
> it's being called to collect input, which is fine in itself, but *it does
> not notice screen resizes that happen when the handler is not active*.
>
> You can prove this by doing, say, "select pg_sleep(10);" and resizing
> the window before the backend comes back.  Readline never notices the
> resize, even though no pager invocation happens at all.
>
> So I now think that print.c shouldn't be involved at all, and the right
> thing to do is just have gets_interactive() invoke the resize function
> immediately before calling readline().  That will still leave a window
> for SIGWINCH to be missed, but it's a pretty tiny one.

right -- agreed.

> And somebody oughta report this as a libreadline bug ...

See https://bugs.python.org/issue23735 for background.  Apparently
this is expected behavior (and we are far from the only ones
complaining about it):

"And so we reach where we are. If a SIGWINCH arrives while readline is
not active, and the application using callback mode does not catch it and
tell readline about the updated screen dimensions (rl_set_screen_size()
exists for this purpose), readline will not know about the new size."

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Alvaro Herrera

Tom Lane wrote:
> I did some more experimentation and concluded that actually, this problem
> has nothing whatsoever to do with pager invocations.  What seems to really
> be happening is that libreadline activates its SIGWINCH handler only while
> it's being called to collect input, which is fine in itself, but *it does
> not notice screen resizes that happen when the handler is not active*.

I wonder if we're doing the proper things.  According to their manual,
http://git.savannah.gnu.org/gitweb/?p=readline.git;a=blob_plain;f=doc/readline.html;h=9b7dd842764c81ad496c38a2794361cad964ee90;hb=7628b745a813aac53586b640da056a975f1c443e#SEC44
I think we should be setting rl_catch_signal and/or rl_catch_sigwinch
and then perhaps call rl_set_signals(), but I don't think we're doing
anything of the sort.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Tom Lane

Merlin Moncure  writes:
> On Wed, Dec 16, 2015 at 1:29 PM, Tom Lane  wrote:
>> So I now think that print.c shouldn't be involved at all, and the right
>> thing to do is just have gets_interactive() invoke the resize function
>> immediately before calling readline().  That will still leave a window
>> for SIGWINCH to be missed, but it's a pretty tiny one.

> right -- agreed.

I tried that and found out that the first call dumps core because readline
hasn't been initialized yet.  To fix that, we need an explicit call to
rl_initialize() instead of depending on readline() to do it for us.
So I arrive at the attached modified patch.

>> And somebody oughta report this as a libreadline bug ...

> See https://bugs.python.org/issue23735 for background.  Apparently
> this is expected behavior (and we are far from the only ones
> complaining about it):

> "And so we reach where we are. If a SIGWINCH arrives while readline is
> not active, and the application using callback mode does not catch it and
> tell readline about the updated screen dimensions (rl_set_screen_size()
> exists for this purpose), readline will not know about the new size."

Meh.  At the very least, this is a serious documentation failure on their
part, because their docs about interrupt handling look exactly the same
as before, to wit saying exactly the opposite of this.

regards, tom lane

diff --git a/configure b/configure
index 8c61390..5772d0e 100755
*** a/configure
--- b/configure
*** if test x"$pgac_cv_var_rl_completion_app
*** 13232,13238 
  $as_echo "#define HAVE_RL_COMPLETION_APPEND_CHARACTER 1" >>confdefs.h
  
  fi
!   for ac_func in rl_completion_matches rl_filename_completion_function
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 13232,13238 
  $as_echo "#define HAVE_RL_COMPLETION_APPEND_CHARACTER 1" >>confdefs.h
  
  fi
!   for ac_func in rl_completion_matches rl_filename_completion_function rl_reset_screen_size
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index b5868b0..44f832f 100644
*** a/configure.in
--- b/configure.in
*** LIBS="$LIBS_including_readline"
*** 1635,1641 
  
  if test "$with_readline" = yes; then
PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER
!   AC_CHECK_FUNCS([rl_completion_matches rl_filename_completion_function])
AC_CHECK_FUNCS([append_history history_truncate_file])
  fi
  
--- 1635,1641 
  
  if test "$with_readline" = yes; then
PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER
!   AC_CHECK_FUNCS([rl_completion_matches rl_filename_completion_function rl_reset_screen_size])
AC_CHECK_FUNCS([append_history history_truncate_file])
  fi
  
diff --git a/src/bin/psql/input.c b/src/bin/psql/input.c
index e377104..c0c5524 100644
*** a/src/bin/psql/input.c
--- b/src/bin/psql/input.c
*** gets_interactive(const char *prompt)
*** 65,70 
--- 65,81 
  	{
  		char	   *result;
  
+ 		/*
+ 		 * Some versions of readline don't notice SIGWINCH signals that arrive
+ 		 * when not actively reading input.  The simplest fix is to always
+ 		 * re-read the terminal size.  This leaves a window for SIGWINCH to be
+ 		 * missed between here and where readline() enables libreadline's
+ 		 * signal handler, but that's probably short enough to be ignored.
+ 		 */
+ #ifdef HAVE_RL_RESET_SCREEN_SIZE
+ 		rl_reset_screen_size();
+ #endif
+ 
  		/* Enable SIGINT to longjmp to sigint_interrupt_jmp */
  		sigint_interrupt_enabled = true;
  
*** initializeInput(int flags)
*** 330,335 
--- 341,347 
  		char		home[MAXPGPATH];
  
  		useReadline = true;
+ 		rl_initialize();
  		initialize_readline();
  
  		useHistory = true;
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index a20e0cd..16a272e 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***
*** 428,433 
--- 428,436 
  /* Define to 1 if you have the `rl_filename_completion_function' function. */
  #undef HAVE_RL_FILENAME_COMPLETION_FUNCTION
  
+ /* Define to 1 if you have the `rl_reset_screen_size' function. */
+ #undef HAVE_RL_RESET_SCREEN_SIZE
+ 
  /* Define to 1 if you have the  header file. */
  #undef HAVE_SECURITY_PAM_APPL_H
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Tom Lane

I wrote:
> Merlin Moncure  writes:
>> See https://bugs.python.org/issue23735 for background.  Apparently
>> this is expected behavior (and we are far from the only ones
>> complaining about it):

>> "And so we reach where we are. If a SIGWINCH arrives while readline is
>> not active, and the application using callback mode does not catch it and
>> tell readline about the updated screen dimensions (rl_set_screen_size()
>> exists for this purpose), readline will not know about the new size."

> Meh.  At the very least, this is a serious documentation failure on their
> part, because their docs about interrupt handling look exactly the same
> as before, to wit saying exactly the opposite of this.

Hm ... actually, the claim that this is somehow new behavior in
libreadline 6.3 or thereabouts is purest bilge, because 4.2a (released
Nov 2001) acts exactly the same.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Tom Lane

Andres Freund  writes:
> Hm. rl_reset_screen_size() works well for me. rl_resize_terminal() not
> so much.  Apparently the reason for that is that rl_reset_screen_size()
> doesn't set ignore_env to to true when calling
> _rl_get_screen_size(). I've verified that just toggling that parameter
> changes the behaviour.

[ squint... ]  What readline version are you using, and do you have
LINES/COLUMNS set in your terminal environment?

As far as I can see in the 6.3 sources, the very first call of
_rl_get_screen_size will cause LINES/COLUMNS to become set from the
results of ioctl(TIOCGWINSZ), assuming that that succeeds, because we
do nothing to alter the default values of rl_prefer_env_winsize (0)
or rl_change_environment (1).  After that, it really doesn't matter
whether ignore_env is true or not; this code will track the ioctl
as long as rl_prefer_env_winsize stays 0.

It may be that the echo stuff is not good, but I'm pretty dubious
about ignore_env being the issue.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

2015-12-16 Thread Robert Haas

On Wed, Dec 9, 2015 at 5:15 PM, Peter Geoghegan  wrote:
> On Wed, Dec 9, 2015 at 11:31 AM, Robert Haas  wrote:
>> I find the references to a "void" representation in this patch to be
>> completely opaque.  I see that there are some such references in
>> tuplesort.c already, and most likely they were put there by commits
>> that I did, so I guess I have nobody but myself to blame, but I don't
>> know what this means, and I don't think we should let this terminology
>> proliferate.
>>
>> My understanding is that the "void" representation is intended to
>> whatever Datum we originally got, which might be a pointer.  Why not
>> just say that instead of referring to it this way?
>
> That isn't what is intended. "void" is the state that macros like
> index_getattr() leave NULL leading attributes (that go in the
> SortTuple.datum1 field) in.

What kind of state is that?  Can't we define this in terms of what it
is rather than how it gets that way?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2015-12-16 Thread Robert Haas

On Sat, Dec 12, 2015 at 8:03 AM, Amit Kapila  wrote:

>>  I think it might be
>> advantageous to have at least two groups because otherwise things
>> might slow down when some transactions are rolling over to a new page
>> while others are still in flight for the previous page.  Perhaps we
>> should try it both ways and benchmark.
>>
>
> Sure, I can do the benchmarks with both the patches, but before that
> if you can once check whether group_slots_update_clog_v3.patch is inline
> with what you have in mind then it will be helpful.

Benchmarking sounds good.  This looks broadly like what I was thinking
about, although I'm not very sure you've got all the details right.

Some random comments:

- TransactionGroupUpdateXidStatus could do just as well without
add_proc_to_group.  You could just say if (group_no >= NUM_GROUPS)
break; instead.  Also, I think you could combine the two if statements
inside the loop.  if (nextidx != INVALID_PGPROCNO &&
ProcGlobal->allProcs[nextidx].clogPage == proc->clogPage) break; or
something like that.

- memberXid and memberXidstatus are terrible names.  Member of what?
That's going to be clear as mud to the next person looking at the
definitiono f PGPROC.  And the capitalization of memberXidstatus isn't
even consistent.  Nor is nextupdateXidStatusElem.  Please do give some
thought to the names you pick for variables and structure members.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-12-16 Thread Fabien COELHO



Hello Tomas,

I'm planning to do some thorough benchmarking of the patches proposed in this 
thread, on various types of hardware (10k SAS drives and SSDs). But is that 
actually needed? I see Andres did some testing, as he posted summary of the 
results on 11/12, but I don't see any actual results or even info about what 
benchmarks were done (pgbench?).


If yes, do we only want to compare 0001-ckpt-14-andres.patch against master, 
or do we need to test one of the previous Fabien's patches?


My 0.02€,

Although I disagree with some aspects of Andres patch, I'm not a committer 
and I'm tired of arguing. I'm just planing to do minor changes to Andres 
version to fix a potential issue if the file is closed which flushing is 
in progress, but that will not change the overall shape of it.


So testing on Andres version seems relevant to me.

For SSD the performance impact should be limited. For disk it should be 
significant if there is no big cache in front of it. There were some 
concerns raised for some loads in the thread (shared memory smaller than 
needed I think?), if you can include such cases that would be great. My 
guess is that it should be not very beneficial in this case because the 
writing is mostly done by bgwriter & worker in this case, and these are 
still random.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Tom Lane

I did some more experimentation and concluded that actually, this problem
has nothing whatsoever to do with pager invocations.  What seems to really
be happening is that libreadline activates its SIGWINCH handler only while
it's being called to collect input, which is fine in itself, but *it does
not notice screen resizes that happen when the handler is not active*.

You can prove this by doing, say, "select pg_sleep(10);" and resizing
the window before the backend comes back.  Readline never notices the
resize, even though no pager invocation happens at all.

So I now think that print.c shouldn't be involved at all, and the right
thing to do is just have gets_interactive() invoke the resize function
immediately before calling readline().  That will still leave a window
for SIGWINCH to be missed, but it's a pretty tiny one.

And somebody oughta report this as a libreadline bug ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

2015-12-16 Thread Peter Geoghegan

On Wed, Dec 16, 2015 at 9:36 AM, Robert Haas  wrote:
>> That isn't what is intended. "void" is the state that macros like
>> index_getattr() leave NULL leading attributes (that go in the
>> SortTuple.datum1 field) in.
>
> What kind of state is that?  Can't we define this in terms of what it
> is rather than how it gets that way?

It's zeroed.

I guess we can update everything, including existing comments, to reflect that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Tom Lane

Alvaro Herrera  writes:
>> I did some more experimentation and concluded that actually, this problem
>> has nothing whatsoever to do with pager invocations.  What seems to really
>> be happening is that libreadline activates its SIGWINCH handler only while
>> it's being called to collect input, which is fine in itself, but *it does
>> not notice screen resizes that happen when the handler is not active*.

> I wonder if we're doing the proper things.  According to their manual,
> http://git.savannah.gnu.org/gitweb/?p=readline.git;a=blob_plain;f=doc/readline.html;h=9b7dd842764c81ad496c38a2794361cad964ee90;hb=7628b745a813aac53586b640da056a975f1c443e#SEC44
> I think we should be setting rl_catch_signal and/or rl_catch_sigwinch
> and then perhaps call rl_set_signals(), but I don't think we're doing
> anything of the sort.

According to the info page I have here, those are all enabled by default,
and we would only need to do something special if we wanted to prevent
readline from catching signals.

It's possible that 6.3 has made fundamental, incompatible changes in
this area and not bothered to update the documentation, but it would be
pretty shoddy work on their part.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

2015-12-16 Thread Robert Haas

On Tue, Nov 17, 2015 at 6:50 PM, Peter Geoghegan  wrote:
> On Tue, Nov 17, 2015 at 12:53 AM, Simon Riggs  wrote:
>> Short and sweet! Looks good.
>
> Thanks.
>
>> I would be inclined to add more comments to explain it, these things have a
>> habit of being forgotten.
>
> I'm not sure what additional detail I can add.
>
> I seem to be able to produce these sorting patches at a much greater
> rate than they can be committed, in part because Robert is the only
> one that ever reviews them, and he is only one person.

I object to that vicious slander.  I am at least three people, if not more!

Meanwhile, I did some simple benchmarking on your latest patch on my
MacBook Pro.  I did pgbench -i -s 100 and then tried:

create index x on pgbench_accounts (aid);
create index concurrently x on pgbench_accounts (aid);

The first took about 6.9 seconds.  The second took about 11.3 seconds
patched versus 14.6 seconds unpatched.  That's certainly a healthy
improvement.

I then rebuilt with --disable-float8-byval, because I was worried that
case might be harmed by this change.  It turns out we win in that case
too, just not by as much.  The time for the first case doesn't change
much, and neither does the unpatched time.  The patched time is about
12.9 seconds.

I have also reviewed the code, and it looks OK to me, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Alvaro Herrera

Alvaro Herrera wrote:

> I wonder if we're doing the proper things.  According to their manual,
> http://git.savannah.gnu.org/gitweb/?p=readline.git;a=blob_plain;f=doc/readline.html;h=9b7dd842764c81ad496c38a2794361cad964ee90;hb=7628b745a813aac53586b640da056a975f1c443e#SEC44

Note: the above link documents readline 6.5 according to git tags.  This
one is for readline 4.2:
http://git.savannah.gnu.org/gitweb/?p=readline.git;a=blob_plain;f=doc/readline.html;hb=abde3125f6228a63e22de708b9edaef62cab0ac3#SEC43

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-12-16 Thread Tomas Vondra


Hi,

On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:


I have the plans to make something from this on top of
pg_stat_statements and auto_explain, as I've mentioned last time.  The
next iteration will be based on the two latest patches above, so it
still makes sense to review them.

As for EXPLAIN ANALYZE support, it will require changes to core, but
this can be done separately.


I'm re-reading the thread, and I have to admit I'm utterly confused what 
is the current plan, what features it's supposed to provide and whether 
it will solve the use case I'm most interested in. Oleksandr, could you 
please post a summary explaining that?


My use case for this functionality is debugging of long-running queries, 
particularly getting EXPLAIN ANALYZE for them. In such cases I either 
can't run the EXPLAIN ANALYZE manually because it will either run 
forever (just like the query) and may not be the same (e.g. due to 
recent ANALYZE). So we need to extract the data from the process 
executing the query.


I'm not essentially opposed to doing this in an extension (better an 
extension than nothing), but I don't quite see how you could to do that 
from pg_stat_statements or auto_explain. AFAIK both extensions merely 
use hooks before/after the executor, and therefore can't do anything in 
between (while the query is actually running).


Perhaps you don't intend to solve this particular use case? Which use 
cases are you aiming to solve, then? Could you explain?



Maybe all we need to do is add another hook somewhere in the executor, 
and push the explain analyze into pg_stat_statements once in a while, 
entirely eliminating the need for inter-process communication (signals, 
queues, ...).


I'm pretty sure we don't need this for "short" queries, because in those 
cases we have other means to get the explain analyze (e.g. running the 
query again or auto_explain). So I can imagine having a rather high 
threshold (say, 60 seconds or even more), and we'd only push the explain 
analyze after crossing it. And then we'd only update it once in a while 
- say, again every 60 seconds.


Of course, this might be configurable by two GUCs:

   pg_stat_statements.explain_analyze_threshold = 60  # -1 is "off"
   pg_stat_statements.explain_analyze_refresh = 60

FWIW I'd still prefer having "EXPLAIN ANALYZE" in core, but better this 
than nothing.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Merlin Moncure

On Wed, Dec 16, 2015 at 12:06 PM, Andres Freund  wrote:
> On 2015-12-16 13:02:25 -0500, Tom Lane wrote:
>> I think the most reasonable way to handle this is to put the
>> call into a new function exported from input.c, where it can be
>> made conditional on useReadline.
>
> Sounds good.

I'll right, I'll follow up with that shortly.

For posterity, I think you guys arrived at the right conclusion, but
the difference between the two public screen reset calls is that one
(the original patch) forces a re-display of the prompt and the newer
one, rl_reset_screen_size(), does not.  With the older patch the best
I could come up with was to call rl_crlf() so that the prompt display
did not amend to the end of \timing's output.  However, that still
modified the output relative to what it was pre-patch, which would be
impolite to our users in a backpatch scenario.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tab-comletion for RLS

2015-12-16 Thread Robert Haas

On Fri, Dec 11, 2015 at 11:56 AM, Masahiko Sawada  wrote:
> On Thu, Dec 10, 2015 at 11:07 PM, Robert Haas  wrote:
>> On Tue, Dec 8, 2015 at 8:32 AM, Masahiko Sawada  
>> wrote:
>>> I found some lacks of tab-completion for RLS in 9.5.
>>>
>>> * ALTER POLICY [TAB]
>>> I expected to appear the list of policy name, but nothing is appeared.
>>>
>>> * ALTER POLICY hoge_policy ON [TAB]
>>> I expected to appear the list of table name related to specified policy, but
>>> all table names are appeared.
>>>
>>> * ALTER POLICY ... ON ... TO [TAB]
>>> I expected to appear { role_name | PUBLIC | CURRENT_USER | SESSION_USER },
>>> but only role_name and PUBLIC are appeared.
>>> Same problem is exists in
>>> "
>>> CREATE POLICY ... ON ... TO [TAB]
>>> "
>>> .
>>>
>>> #1 and #2 problems are exist in 9.5 or later, but #3 is exist in only 9.5
>>> because it's unintentionally fixed by
>>> 2f8880704a697312d8d10ab3a2ad7ffe4b5e3dfd commit.
>>> I think we should apply the necessary part of this commit for 9.5 as well,
>>> though?
>>>
>>> Attached patches are:
>>> * 000_fix_tab_completion_rls.patch
>>>   fixes #1, #2 problem, and is for master branch and REL9_5_STABLE.
>>> * 001_fix_tab_completion_rls_for_95.patch
>>>   fixes #3 problem, and is for only REL9_5_STABLE.
>>
>> I've committed 000 and back-patched it to 9.5.  I'm not quite sure
>> what to do about 001; maybe it's better to back-port the whole commit
>> rather than just bits of it.
>
> Yes, I agree with back-port the whole commit.

On further review, this doesn't really seem like a sufficiently
critical issue to justify back-porting that commit.  I won't make a
stink if some other committer wants to push that commit into 9.5, but
I don't want to do it myself and then be left holding the bag if it
breaks something.  We're generally pretty lenient about pushing tab
completion patches into the tree even well after feature freeze, but
post-rc1 is a little more than I want to be on the hook for.  This is
clearly not a bug; it's just a feature that you'd like to have.  And
9.6 will have it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

2015-12-16 Thread Andres Freund

On 2015-12-11 10:08:32 -0800, Jeff Janes wrote:
> This is still in regular mode, correct?

Yes.

> I don't think this has ever worked.  Vacuum needs to start a
> transaction in order to record its update of datfrozenxid and
> relfrozenxid to the catalogs (or at least, starts one for something).
> Once you are within 1,000,000 of wraparound, you have to do the vacuum
> in single-user mode, you can no longer just wait for autovacuum to do
> its thing.  Otherwise the vacuum will do all the work of the vacuum,
> but then fail to clear the error condition.

But since we don't have, afaik, any code to handle that we should still
be starting workers to do the truncation. Which isn't happening here.

But more importantly, it *should* be impossible to have that combination
of oldestXid values with the datfrozenxid values.

> Could the database have undergone a crash and recovery cycle?

Could, but I don't think it has before the problem occurred. The first
pgbench failure was about not being able to assign xids anymore.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Andres Freund

On 2015-12-16 13:02:25 -0500, Tom Lane wrote:
> > If we really want to we could basically directly use
> > _rl_get_screen_size() - which seems to have been present from before
> > 4.0. It's not declared static...
> 
> Nah, I don't think we should rely on calling undocumented internal
> readline functions.

Agreed.


> We've lived with this behavior for long enough
> that I don't think it's a catastrophe if we don't fix it for ancient
> readline releases.

Yes.


> I think the most reasonable way to handle this is to put the
> call into a new function exported from input.c, where it can be
> made conditional on useReadline.

Sounds good.


> Does anyone object to back-patching it?

Not here.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Andres Freund

On 2015-12-16 12:02:26 -0500, Tom Lane wrote:
> [ squint... ]  What readline version are you using, and do you have
> LINES/COLUMNS set in your terminal environment?

libreadline-dev:
  Installed: 6.3-8+b4

Both are set - I think bash does that.

> It may be that the echo stuff is not good, but I'm pretty dubious
> about ignore_env being the issue.

Indeed it's not, I'm not entirely sure whether I managed to repeatedly
run with the wrong version of psql (doubtful, same terminal), or what
strange state I observed.

But
_rl_get_screen_size (fileno (rl_instream), 0);
if (RL_ISSTATE(RL_STATE_REDISPLAYING) == 0)
_rl_redisplay_after_sigwinch ();
in ClosePager() shows the problem independent of ignore_env or not.

I think the difference here is actually that redrawing shows the query
after the pager, without it we don't show it again.

If you configure less to automatically quit if the query is below one
screen (-F) and use \pset pager always the difference is clearly
visible:

resize (just _rl_get_screen_size()):

postgres[12561][1]=# SELECT 1;
┌──┐
│ ?column? │
├──┤
│1 │
└──┘
(1 row)

Time: 0.797 ms
postgres[12561][1]=# 

resize & redraw (_rl_get_screen_size() & _rl_redisplay_after_sigwinch):

postgres[12393][1]=# SELECT 1;
┌──┐
│ ?column? │
├──┤
│1 │
└──┘
(1 row)

postgres[12393][1]=# SELECT 1;Time: 0.555 ms

based on that I'm inclined to just go with resize - redisplaying the
query after the pager might be desirable, but I think it's an actual
behavioural change.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Tom Lane

Andres Freund  writes:
> ... based on that I'm inclined to just go with resize - redisplaying the
> query after the pager might be desirable, but I think it's an actual
> behavioural change.

Hmm ... given that we've not printed "Time:" yet (in \timing mode),
I think you're right that we don't want anything printed at this point.
It may be that we can't fix this in readline versions that precede the
introduction of the resize function.  Let me go experiment on my pet
dinosaurs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Andres Freund

On 2015-12-16 12:23:28 -0500, Tom Lane wrote:
> It may be that we can't fix this in readline versions that precede the
> introduction of the resize function.  Let me go experiment on my pet
> dinosaurs.

I'm not particularly bothered by not supporting old readline versions
here. If we really want to we could basically directly use
_rl_get_screen_size() - which seems to have been present from before
4.0. It's not declared static...

extern void _rl_get_screen_size (int tty, int ignore_env);
and then
_rl_get_screen_size (fileno (rl_instream), 0);
in ClosePager() seems to do the trick.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster "stuck" in "not accepting commands to avoid wraparound data loss"

2015-12-16 Thread Robert Haas

On Fri, Dec 11, 2015 at 1:08 PM, Jeff Janes  wrote:
> Since changes to datfrozenxid are WAL logged at the time they occur,
> but the supposedly-synchronous change to ShmemVariableCache is not WAL
> logged until the next checkpoint, a well timed crash can leave you in
> the state where the system is in a tizzy about wraparound but each
> database says "Nope, not me".

ShmemVariableCache is an in-memory data structure, so it's going to
get blown away and rebuilt on a crash.  But I guess it gets rebuild
from the contents of the most recent checkpoint record, so that
doesn't actually help.  However, I wonder if it would be safe to for
the autovacuum launcher to calculate an updated value and call
SetTransactionIdLimit() to update ShmemVariableCache.

But I'm somewhat confused what this has to do with Andres's report.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Tom Lane

Andres Freund  writes:
> On 2015-12-16 12:23:28 -0500, Tom Lane wrote:
>> It may be that we can't fix this in readline versions that precede the
>> introduction of the resize function.  Let me go experiment on my pet
>> dinosaurs.

> I'm not particularly bothered by not supporting old readline versions
> here.

I'm not either, just wanted to know if it was possible.  (Answer: no.
rl_resize_terminal definitely does not do what we want, neither in
current releases nor in 4.2.  It looks to me like it's only intended
to be called while interactive input is active, which is the only
time that libreadline has its signal handler in place.)

> If we really want to we could basically directly use
> _rl_get_screen_size() - which seems to have been present from before
> 4.0. It's not declared static...

Nah, I don't think we should rely on calling undocumented internal
readline functions.  We've lived with this behavior for long enough
that I don't think it's a catastrophe if we don't fix it for ancient
readline releases.

One issue I do have with the patch as proposed is that it will call
rl_reset_screen_size even with -n, which does not seem like a good
idea.  There's no guarantee that rl_reset_screen_size will behave
nicely if libreadline hasn't been initialized properly, and even
if it does, it will have side-effects on the LINES/COLUMNS environment
variables which are potentially user-visible and should not happen
with -n.  I think the most reasonable way to handle this is to put the
call into a new function exported from input.c, where it can be
made conditional on useReadline.

Except for that minor rearrangement, I think this is a good patch
and we should accept it.  Does anyone object to back-patching it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use_remote_estimate usage for join pushdown in postgres_fdw

2015-12-16 Thread Robert Haas

On Fri, Dec 11, 2015 at 4:44 AM, Ashutosh Bapat
 wrote:
> Hi All,
> postgres_fdw documentation says following about use_remote_estimate
> (http://www.postgresql.org/docs/devel/static/postgres-fdw.html)
> --
> use_remote_estimate
> This option, which can be specified for a foreign table or a foreign server,
> controls whether postgres_fdw issues remote EXPLAIN commands to obtain cost
> estimates. A setting for a foreign table overrides any setting for its
> server, but only for that table. The default is false.
> --
>
> I am trying to see, how should we use this option in the context of join
> pushdown and for
> that matter any pushdown involving more than one table.
>
> I came up with following arguments
> 1. Foreign base relations derive their use_remote_estimate setting either
> from the server setting or the per table setting. A join between two foreign
> relations should derive its use_remote_estimate setting from the joining
> relations (recursively). This means that we will use EXPLAIN to estimate
> costs of join if "all" the involved base foreign relations have
> use_remote_estimate true (either they derive it from the server level
> setting or table level setting).
>
> 2. Similar to 1, but use EXPLAIN to estimate costs if "any" of the involved
> base foreign relations have use_remote_estimate is true.
>
> 3. Since join between two foreign relations is not a table level phenomenon,
> but a server level phenomenon, we should use server level setting. This
> means that we will use EXPLAIN output to estimate costs of join if the
> foreign server has use_remote_estimate true, irrespective of the setting for
> individual foreign relations involved in that join.
>
> Unfortunately the documentation and comments in code do not say much about
> the intention (i.e. why and how is this setting expected to be used) of this
> setting in the context or server.
>
> The intention behind server level setting is more confusing. It does not
> override table level setting, so it is not intended to be used for a
> prohibitive reason like e.g. server doesn't support EXPLAIN the way it will
> be interpreted locally. It seems to act more like a default in case table
> level setting is absent. User may set table level use_remote_estimate to
> true, if cost of EXPLAIN is very small compared to that of table scan (with
> or without indexes) or adding conditional clauses to the query alters the
> costs heavily that the cost of EXPLAIN itself is justified. But I can be
> wrong about these intentions.
>
> If we go by the above intention behind table level setting, 2nd argument
> makes more sense as the table for which use_remote_estimate is true, can
> change the cost of join heavily because of the clauses in the join and it's
> better to get it from the foreign server than guessing it locally.
>
> Comments/suggestions are welcome.

I like option #2.  I don't really have a strong reason for that, but
it feels intuitive to me that we err on the side of using remote
estimates when in doubt.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-12-16 Thread Tomas Vondra


Hi,

I'm planning to do some thorough benchmarking of the patches proposed in 
this thread, on various types of hardware (10k SAS drives and SSDs). But 
is that actually needed? I see Andres did some testing, as he posted 
summary of the results on 11/12, but I don't see any actual results or 
even info about what benchmarks were done (pgbench?).


If yes, do we only want to compare 0001-ckpt-14-andres.patch against 
master, or do we need to test one of the previous Fabien's patches?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use_remote_estimate usage for join pushdown in postgres_fdw

2015-12-16 Thread Tom Lane

Robert Haas  writes:
> I like option #2.  I don't really have a strong reason for that, but
> it feels intuitive to me that we err on the side of using remote
> estimates when in doubt.

If we believe that, why isn't the default value of use_remote_estimate true?
(Maybe it should be.)

Another option that should be considered is that joins should pay
attention only to the server-level setting and not to the individual
tables' settings.  This would surely be cheaper to implement, and
it avoids any questions about whether to OR or AND the individual
settings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use_remote_estimate usage for join pushdown in postgres_fdw

2015-12-16 Thread Robert Haas

On Wed, Dec 16, 2015 at 1:11 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I like option #2.  I don't really have a strong reason for that, but
>> it feels intuitive to me that we err on the side of using remote
>> estimates when in doubt.
>
> If we believe that, why isn't the default value of use_remote_estimate true?
> (Maybe it should be.)
>
> Another option that should be considered is that joins should pay
> attention only to the server-level setting and not to the individual
> tables' settings.  This would surely be cheaper to implement, and
> it avoids any questions about whether to OR or AND the individual
> settings.

That was Ashutosh's option #3.

use_remote_estimate is a pretty expensive option, which is why it's
not on by default.  But if you are willing to spend that effort for a
scan of table A parameterized by a value from table B, it seems likely
to me that you are also willing to spend the effort to accurately cost
a pushed-down join of A and B.  Actually, it seems like it would be
more surprising if you weren't: we're willing to accurately cost
iterating the scan of B, but not pushing the whole join down?  Hmm.

That's an arguable position, of course.

Actually, I think that neither use_remote_estimate nor
!use_remote_estimate is a particularly great option.
!use_remote_estimate produces results that are basically pulled out of
a hat.  use_remote_estimate produces good estimates, but it's pretty
expensive for a planning operation.  I'd like to have some other
alternative, like a local cache of metadata that we can consult when
!use_remote_estimate instead of just making things up, which might
tell us things like what indexes exist on the remote side.  But that's
a different project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2015-12-16 Thread David Rowley

On 17 December 2015 at 05:02, Tomas Vondra 
wrote:

> 0) I know the patch does not tweak costing - any plans in this
>
   direction? Would it be possible to simply use the costing used by
>semijoin?
>
>
Many thanks for looking at this.

The patch does tweak the costings so that unique joins are costed in the
same way as semi joins. It's a very small change, so easy to miss.

For example, see the change in initial_cost_nestloop()

-   if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+   if (jointype == JOIN_SEMI ||
+   jointype == JOIN_ANTI ||
+   unique_inner)
{

Also see the changes in final_cost_nestloop() and final_cost_hashjoin()


1) nodeHashjoin.c (and other join nodes)
>
> I've noticed we have this in the ExecHashJoin() method:
>
>  /*
>   * When the inner side is unique or we're performing a
>   * semijoin, we'll consider returning the first match, but
>   * after that we're done with this outer tuple.
>   */
>  if (node->js.unique_inner)
>  node->hj_JoinState = HJ_NEED_NEW_OUTER;
>
> That seems a bit awkward because the comment speaks about unique joins
> *OR* semijoins, but the check only references unique_inner. That of course
> works because we do this in ExecInitHashJoin():
>
> switch (node->join.jointype)
> {
> case JOIN_SEMI:
> hjstate->js.unique_inner = true;
> /* fall through */
>
> Either some semijoins may not be unique - in that case the naming is
> misleading at best, and we either need to find a better name or simply
> check two conditions like this:
>
>  if (node->js.unique_inner || node->join.type == JOIN_SEMI)
> node->hj_JoinState = HJ_NEED_NEW_OUTER;
>
> Or all semijoins are unique joins, and in that case the comment may need
> rephrasing. But more importantly, it begs the question why we're detecting
> this in the executor and not in the planner? Because if we detect it in
> executor, we can't use this bit in costing, for example.
>
>
The reason not to detect that in the planner is simply that unique_join is
meant to mean that the relation on the inner side of the join will at most
contain a single Tuple which matches each outer relation tuple, based on
the join condition. I've added no detection for this in semi joins, and I'd
rather not go blindly setting the flag to true in the planner as it simply
may not be true for the semi join. At the moment that might not matter as
we're only using the unique_join flag as an optimization in the join nodes,
but I'd prefer not to do this as its likely we'll want to do more with this
flag later, and I'd rather we keep the meaning well defined. You might
argue that this is also true for semi joins, but if down the road somewhere
we want to perform some checks on the inner relation before the join takes
place, and in that case the Tuples of the relation might not have the same
properties we claim they do.

But you're right that reusing the flag in the join nodes is not ideal, and
the comment is not that great either. I'd really rather go down the path of
either renaming the variable, or explaining this better in the comment. It
seems unnecessary to check both for each tuple being joined. I'd imagine
that might add a little overhead to joins which are not unique.

How about changing the comment to:

/*
* In the event that the inner side has been detected to be
* unique, as an optimization we can skip searching for any
* subsequent matching inner tuples, as none will exist.
* For semijoins unique_inner will always be true, although
* in this case we don't match another inner tuple as this
* is the required semi join behavior.
*/

Alternatively or additionally we can rename the variable in the executor
state, although I've not thought of a name yet that I don't find overly
verbose: unique_inner_or_semi_join,  match_first_tuple_only.


2) analyzejoins.c
>
> I see that this code in join_is_removable()
>
> innerrel = find_base_rel(root, innerrelid);
>
> if (innerrel->reloptkind != RELOPT_BASEREL)
> return false;
>
> was rewritten like this:
>
> innerrel = find_base_rel(root, innerrelid);
>
> Assert(innerrel->reloptkind == RELOPT_BASEREL);
>
> That suggests that previously the function expected cases where reloptkind
> may not be RELOPT_BASEREL, but now it'll error out int such cases. I
> haven't noticed any changes in the surrounding code that'd guarantee this
> won't happen, but I also haven't been able to come up with an example
> triggering the assert (haven't been trying too hard). How do we know the
> assert() makes sense?
>
>
I'd have changed this as this should be covered by the  if
(!sjinfo->is_unique_join || a few lines up. mark_unique_joins() only sets
is_unique_join to true if specialjoin_is_unique_join() returns true and
that function tests reloptkind to ensure its set to RELOPT_BASEREL and
return false if the relation is not. Perhaps what is missing

Re: [HACKERS] Should TIDs be typbyval = FLOAT8PASSBYVAL to speed up CREATE INDEX CONCURRENTLY?

2015-12-16 Thread Peter Geoghegan

On Wed, Dec 16, 2015 at 12:28 PM, Robert Haas  wrote:
>> I seem to be able to produce these sorting patches at a much greater
>> rate than they can be committed, in part because Robert is the only
>> one that ever reviews them, and he is only one person.
>
> I object to that vicious slander.  I am at least three people, if not more!

I was referring only to the Robert that reviews my sorting patches.  :-)

> Meanwhile, I did some simple benchmarking on your latest patch on my
> MacBook Pro.  I did pgbench -i -s 100 and then tried:
>
> create index x on pgbench_accounts (aid);
> create index concurrently x on pgbench_accounts (aid);
>
> The first took about 6.9 seconds.  The second took about 11.3 seconds
> patched versus 14.6 seconds unpatched.  That's certainly a healthy
> improvement.

That seems pretty good. It probably doesn't matter, but FWIW it's
likely that your numbers are not as good as mine because this ends up
with a perfect logical/physical correlation, which the quicksort
precheck [1] does very well on when sorting the TIDs (since input is
*perfectly* correlated, as opposed to 99.99% correlated, a case that
does poorly [2]).

> I have also reviewed the code, and it looks OK to me, so committed.

Thanks!

[1] Commit a3f0b3d68f9a5357a3f72b40a45bcc714a9e0649
[2] http://www.postgresql.org/message-id/54eb580c.2000...@2ndquadrant.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing ClogControlLock contention

2015-12-16 Thread Simon Riggs

On 22 August 2015 at 15:14, Andres Freund  wrote:

> TransactionIdSetPageStatus() calls TransactionIdSetStatusBit(), which
> writes an 8 byte variable (the lsn). That's not safe.
>

This point was the main sticking point here.

It turns out that we don't need to store the LSN (8 bytes).

WAL is fsynced every time we finish writing a file, so we only actually
need to store the byte position within each file, so no more than 16MB.
That fits within a 4 byte value, so can be written atomically.

So I have a valid way forward for this approach. Cool.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Remove array_nulls?

2015-12-16 Thread Robert Haas

On Tue, Dec 15, 2015 at 1:26 AM, Michael Paquier
 wrote:
> On Tue, Dec 15, 2015 at 2:57 AM, Jim Nasby  wrote:
>> On 12/11/15 2:57 PM, Tom Lane wrote:
>>> Jim Nasby  writes:
>>> Perhaps, but I'd like to have a less ad-hoc process about it.  What's
>>> our policy for dropping backwards-compatibility GUCs?  Are there any
>>> others that should be removed now as well?
>>
>>
>> Perhaps it should be tied to bumping the major version number, which I'm
>> guessing would happen next whenever we get parallel query execution. If we
>> do that, a reasonable policy might be that a compatability GUC lives across
>> no more than 1 major version bump (ie, we wouldn't remove something in 9.0
>> that was added in 8.4).
>
> Another possibility may be to link that with the 5-year maintenance
> window of community: a compatibility GUC is dropped in the following
> major release if the oldest stable version maintained is the one that
> introduced it. Just an idea.

Yeah, there's something to be said for that, although to be honest in
most cases I'd prefer to wait longer.   I wonder about perhaps
planning to drop things after two lifecycles.  That is, when the
release where the incompatibility was added goes out of support, we
don't do anything, but when the release that was current when it went
out of support is itself out of support, then we drop the GUC.  For
example, 8.2 went EOL in December 2011, at which point the newest
release was 9.1.  So when 9.1 is out of support, then we could drop
it; that's due to happen this September.  So 9.6 (or 10.0, if we call
it that) could drop it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] A Typo in regress/sql/privileges.sql

2015-12-16 Thread Tatsuro Yamada

Hi,

This is my first post to -hackers.

I found typos in privileges.sql and privileges.out
Please find attached a patch.

Best regards,
Tatsuro Yamada

*** a/src/test/regress/expected/privileges.out
--- b/src/test/regress/expected/privileges.out
***
*** 390,396  INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set one = 8; -- f
  ERROR:  permission denied for relation atest5
  INSERT INTO atest5(three) VALUES (4) ON CONFLICT (two) DO UPDATE set three = 10; -- fails (due to INSERT)
  ERROR:  permission denied for relation atest5
! -- Check that the the columns in the inference require select privileges
  -- Error. No privs on four
  INSERT INTO atest5(three) VALUES (4) ON CONFLICT (four) DO UPDATE set three = 10;
  ERROR:  permission denied for relation atest5
--- 390,396 
  ERROR:  permission denied for relation atest5
  INSERT INTO atest5(three) VALUES (4) ON CONFLICT (two) DO UPDATE set three = 10; -- fails (due to INSERT)
  ERROR:  permission denied for relation atest5
! -- Check that the columns in the inference require select privileges
  -- Error. No privs on four
  INSERT INTO atest5(three) VALUES (4) ON CONFLICT (four) DO UPDATE set three = 10;
  ERROR:  permission denied for relation atest5
*** a/src/test/regress/sql/privileges.sql
--- b/src/test/regress/sql/privileges.sql
***
*** 259,265  INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set three = EXCLU
  INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set three = EXCLUDED.three;
  INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set one = 8; -- fails (due to UPDATE)
  INSERT INTO atest5(three) VALUES (4) ON CONFLICT (two) DO UPDATE set three = 10; -- fails (due to INSERT)
! -- Check that the the columns in the inference require select privileges
  -- Error. No privs on four
  INSERT INTO atest5(three) VALUES (4) ON CONFLICT (four) DO UPDATE set three = 10;
  
--- 259,265 
  INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set three = EXCLUDED.three;
  INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set one = 8; -- fails (due to UPDATE)
  INSERT INTO atest5(three) VALUES (4) ON CONFLICT (two) DO UPDATE set three = 10; -- fails (due to INSERT)
! -- Check that the columns in the inference require select privileges
  -- Error. No privs on four
  INSERT INTO atest5(three) VALUES (4) ON CONFLICT (four) DO UPDATE set three = 10;
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel joins, and better parallel explain

2015-12-16 Thread Amit Kapila

On Wed, Dec 16, 2015 at 9:55 PM, Dilip Kumar  wrote:

> On Wed, Dec 16, 2015 at 6:20 PM Amit Kapila 
> wrote:
>
> >On Tue, Dec 15, 2015 at 7:31 PM, Robert Haas 
> wrote:
> >>
> >> On Mon, Dec 14, 2015 at 8:38 AM, Amit Kapila 
> wrote:
>
> > In any case,
> >I have done some more investigation of the patch and found that even
> >without changing query planner related parameters, it seems to give
> >bad plans (as in example below [1]).  I think here the costing of rework
> each
>
> I have done some more testing using TPC-H benchmark (For some of the
> queries, specially for Parallel Hash Join), and Results summary is as below.
>
>
> *Planning Time(ms)*
> *Query* *Base* *Patch* TPC-H Q2 2.2 2.4 TPCH- Q3 0.67 0.71 TPCH- Q5 3.17
> 2.3 TPCH- Q7 2.43 2.4
>
>
>
> *Execution Time(ms)*
> *Query* *Base* *Patch* TPC-H Q2 2826 766 TPCH- Q3 23473 24271 TPCH- Q5
> 21357 1432 TPCH- Q7 6779 1138
> All Test files and Detail plan output is attached in mail
> q2.sql, q3.sql, q.5.sql ans q7.sql are TPCH benchmark' 2nd, 3rd, 5th and
> 7th query
> and Results with base and Parallel join are attached in q*_base.out and
> q*_parallel.out respectively.
>
> Summary: With TPC-H queries where ever Hash Join is pushed under gather
> Node, significant improvement is visible,
> with Q2, using 3 workers, time consumed is almost 1/3 of the base.
>
>
> I Observed one problem, with Q5 and Q7, there some relation and snapshot
> references are leaked and i am getting below warning, havn't yet looked
> into the issue.
>
>
While looking at plans of Q5 and Q7, I have observed that Gather is
pushed below another Gather node for which we don't have appropriate
way of dealing.  I think that could be the reason why you are seeing
the errors.

Also, I think it would be good if you can once check the plan/execution
time with max_parallel_degree=0 as that can give us base reference
data without parallelism, also I am wondering if have you have changed
any other parallel cost related parameter?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Making tab-complete.c easier to maintain

2015-12-16 Thread Kyotaro HORIGUCHI

Hello.

Ok, I withdraw the minilang solution and I'll go on Thomas's way,
which is still good to have.

At Mon, 14 Dec 2015 14:13:02 +0900, Michael Paquier  
wrote in 
> On Mon, Dec 14, 2015 at 8:10 AM, Thomas Munro
>  wrote:
> > I've also add (very) primitive negative pattern support and used it in
> > 5 places.  Improvement?  Examples:
> >
> > /* ALTER TABLE xxx RENAME yyy */
> > -   else if (TailMatches5("ALTER", "TABLE", MatchAny, "RENAME", 
> > MatchAny) &&
> > -!TailMatches1("CONSTRAINT|TO"))
> > +   else if (TailMatches5("ALTER", "TABLE", MatchAny, "RENAME",
> > "!CONSTRAINT|TO"))
> > COMPLETE_WITH_CONST("TO");
> >
> > /* If we have CLUSTER , then add "USING" */
> > -   else if (TailMatches2("CLUSTER", MatchAny) &&
> > !TailMatches1("VERBOSE|ON"))
> > +   else if (TailMatches2("CLUSTER", "!VERBOSE|ON"))
> > COMPLETE_WITH_CONST("USING");
> 
> +   /* Handle negated patterns. */
> +   if (*pattern == '!')
> +   return !word_matches(pattern + 1, word);
> 
> Yeah, I guess that's an improvement for those cases, and there is no
> immediate need for a per-keyword NOT operator in our cases to allow
> things of the type (foo1 OR NOT foo2). Still, in the case of this
> patch !foo1|foo2 actually means (NOT foo1 AND NOT foo2). It does not
> seem that much intuitive. Reading straight this expression it seems
> that !foo1|foo2 means actually (NOT foo1 OR foo2) because the lack of
> parenthesis. Thoughts?

I used just the same expression as Thomas in my patch since it
was enough intuitive in this context in my view. The expression
"(!FOO1)|FOO2" is a nonsence in the context of tab-completion and
won't be used in future. But it is true that "!(FOO|BAR|BAZ)" is
clearer than without the parenthesis.

We could use other characters as the operator, but it also might
make it a bit difficalt to read the meaning.

"~FOO|BAR|BAZ", "-FOO|BAR|BAZ"

"TailMatches2("CLUSTER", NEG "VERBOSE|ON")" is mere a syntax
sugar but reduces the uneasiness. But rather longer than adding
parenthesis.

As the result, I vote for "!(FOO|BAR|BAZ)", then "-FOO|BAR|BAZ"
for now.

Addition to that, I feel that successive "MatchAny"s are a bit
bothersome.

>  TailMatches6("COMMENT", "ON", MatchAny, MatchAny, MatchAny, MatchAny)) &&
>  !TailMatches1("IS")

Is MachAny acceptable? On concern is the two n's
(TailMatches and MatchAny) looks a bit confising.

>  TailMatches4("COMMENT", "ON", MatchAny3, "!IS")

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-12-16 Thread Michael Paquier

On Thu, Dec 10, 2015 at 3:31 AM, Robert Haas  wrote:
> On Mon, Nov 30, 2015 at 12:58 PM, Bruce Momjian  wrote:
>> On Mon, Nov 30, 2015 at 10:48:04PM +0530, Masahiko Sawada wrote:
>>> On Sun, Nov 29, 2015 at 2:21 AM, Jeff Janes  wrote:
>>> > On Tue, Nov 24, 2015 at 3:13 PM, Masahiko Sawada  
>>> > wrote:
>>> >>
>>> >> Yeah, we need to consider to compute checksum if enabled.
>>> >> I've changed the patch, and attached.
>>> >> Please review it.
>>> >
>>> > Thanks for the update.  This now conflicts with the updates doesn to
>>> > fix pg_upgrade out-of-space issue on Windows. I've fixed (I think) the
>>> > conflict in order to do some testing, but I'd like to get an updated
>>> > patch from the author in case I did it wrong.  I don't want to find
>>> > bugs that I just introduced myself.
>>> >
>>>
>>> Thank you for having a look.
>>
>> I would not bother mentioning this detail in the pg_upgrade manual page:
>>
>> +   Since the format of visibility map has been changed in version 9.6,
>> +   pg_upgrade creates and rewrite new 
>> '_vm'
>> +   file even if upgrading from 9.5 or before to 9.6 or later with link mode 
>> (-k).
>
> Really?  I know we don't always document things like this, but it
> seems like a good idea to me that we do so.

Just going though v30...

+frozen. The whole-table freezing is occuerred only when all pages happen to
+require freezing to freeze rows. In other cases such as where

I am not really getting the meaning of this sentence. Shouldn't this
be reworded something like:
"Freezing occurs on the whole table once all pages of this relation require it."

+relfrozenxid is more than
vacuum_freeze_table_age
+transcations old, where VACUUM's FREEZE
option is used,
+VACUUM can skip the pages that all tuples on the page
itself are
+marked as frozen.
+When all pages of table are eventually marked as frozen by
VACUUM,
+after it's finished age(relfrozenxid) should be a little more
+than the vacuum_freeze_min_age setting that was used (more by
+the number of transcations started since the VACUUM started).
+If the advancing of relfrozenxid is not happend until
+autovacuum_freeze_max_age is reached, an autovacuum will soon
+be forced for the table.

s/transcations/transactions.

+ n_frozen_page
+ integer
+ Number of frozen pages
n_frozen_pages?

make check with pg_upgrade is failing for me:
Visibility map rewriting test failed
make: *** [check] Error 1

-GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
+GetVisibilitymapPins(Relation relation, Buffer buffer1, Buffer buffer2,
This looks like an unrelated change.

  * Clearing a visibility map bit is not separately WAL-logged.  The callers
  * must make sure that whenever a bit is cleared, the bit is cleared on WAL
- * replay of the updating operation as well.
+ * replay of the updating operation as well.  And all-frozen bit must be
+ * cleared with all-visible at the same time.
This could be reformulated. This is just an addition on top of the
existing paragraph.

+ * The visibility map has the all-frozen bit which indicates all tuples on
+ * corresponding page has been completely frozen, so the visibility map is also
"have been completely frozen".

-/* Mapping from heap block number to the right bit in the visibility map */
-#define HEAPBLK_TO_MAPBLOCK(x) ((x) / HEAPBLOCKS_PER_PAGE)
-#define HEAPBLK_TO_MAPBYTE(x) (((x) % HEAPBLOCKS_PER_PAGE) /
HEAPBLOCKS_PER_BYTE)
Those two declarations are just noise in the patch: those definitions
are unchanged.

-   elog(DEBUG1, "vm_clear %s %d", RelationGetRelationName(rel), heapBlk);
+   elog(DEBUG1, "vm_clear %s block %d",
RelationGetRelationName(rel), heapBlk);
This may be better as a separate patch.

-visibilitymap_count(Relation rel)
+visibilitymap_count(Relation rel, BlockNumber *all_frozen)
I think that this routine would gain in clarity if reworked as follows:
visibilitymap_count(Relation rel, BlockNumber *all_visible,
BlockNumber *all_frozen)

+   /*
+* Report ANALYZE to the stats collector, too.
However, if doing
+* inherited stats we shouldn't report, because the
stats collector only
+* tracks per-table stats.
+*/
+   if (!inh)
+   pgstat_report_analyze(onerel, totalrows,
totaldeadrows, relallfrozen);
Here we already know that this is working in the non-inherited code
path. As a whole, why that? Why isn't relallfrozen passed as an
argument of vac_update_relstats and then inserted in pg_class? Maybe I
am missing something..

+* mxid full-table scan limit. During full scan, we could skip some pags
+* according to all-frozen bit of visibility map.
s/pags/pages

+* Also, skipping even a single page accorinding to all-visible bit of
s/accorinding/according.

So, lazy_scan_heap is the central and

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2015-12-16 Thread Michael Paquier

On Sun, Oct 4, 2015 at 2:07 AM, Peter Geoghegan  wrote:
> On Sat, Oct 3, 2015 at 6:38 AM, Andres Freund  wrote:
>> I'm not arguing against any of this - but I don't think this needs to be
>> on the 9.5 open items list. I plan to remove from there.
>
> Obviously I don't think that this is a critical fix. I do think that
> it would be nice to keep the branches in sync, and that might become a
> bit more difficult after 9.5 is released.

(A couple of months later)
This is not an actual fix, but an optimization, no?
UNIQUE_CHECK_SPECULATIVE is just used to optimize a couple of code
paths in the case of a insert conflicting during btree insertion..

In any case, at this point 9.5 is really aimed to be stabilized, so
targeting only master is a far saner approach IMO for this patch.
Pushing that in 9.5 a couple of months back may have given enough
reason to do so... But well life is life.

+* it later
Missing a dot here :)

+* Set checkedIndex here, since partial unique index
will still count
+* as a found arbiter index despite being skipped due
to predicate not
+* being satisfied
Ditto.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2015-12-16 Thread Michael Paquier

On Wed, Dec 16, 2015 at 3:07 PM, Fabien COELHO  wrote:
>
>> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas 
>> wrote:
>>>
>>> It looks fine to me except that I think we should spell out "param" as
>>> "parameter" throughout, instead of abbreviating.
>>
>>
>> Fine for me. I have updated the first patch as attached (still looking
>> at the second).
>
>
> This doc update threshold -> parameter & sgml and C code looks ok to me.

So I am having a look at the second patch...

+  double constants such as 3.14156,
You meant perhaps 3.14159 :)

+   max(i,
...)
+   integer
Such function declarations are better with optional arguments listed
within brackets.

+  
+   double(i)
+   double
+   cast to double
+   double(5432)
+   5432.0
+  
+  
+   int(x)
+   integer
+   cast to int
+   int(5.4 + 3.8)
+   9
+  
If there are cast functions, why doesn't this patch introduces the
concept of the data type for a variable defined in a script? Both
concepts are linked, and for now as I can see the allocation of a \set
variable is actually always an integer.

In consequence, sqrt behavior is a bit surprising, for example this script:
\set aid sqrt(2.0)
SELECT :aid;
returns that:
SELECT 1;
Shouldn't a user expect 1.414 instead? Fabien, am I missing a trick?

It seems to me that this patch would gain in clarity by focusing a bit
more on the infrastructure first and remove a couple of things that
are not that mandatory for now... So the following things are not
necessary as of now:
- cast functions from/to int/double, because a result variable of a
\set does not include the idea that a result variable can be something
else than an integer. At least no options is given to the user to be
able to make a direct use of a double value.
- functions that return a double number: sqrt, pi
- min and max have value because they allow a user to specify the
expression once as a variable instead of writing it perhaps multiple
times in SQL, still is it enough to justify having them as a set of
functions available by default? I am not really sure either.

Really, I like this patch, and I think that it is great to be able to
use a set of functions and methods within a pgbench script because
this clearly can bring more clarity for a developer, still it seems
that this patch is trying to do too many things at the same time:
1) Add infrastructure to support function calls and refactor the
existing functions to use it. This is the FUNCTION stuff in the
expression scanner.
2) Add the concept of double return type, this could be an extension
of \set with a data type, or a new command like \setdouble. This
should include as well the casting routines I guess. This is the
DOUBLE stuff in the expression scanner.
3) Introduce new functions, as needed.

1) and 2) seem worthwhile to me. 3) may depend on the use cases we'd
like to have.. sqrt has for example value if a variable can be set
double as a double type.

In conclusion, for this CF the patch doing the documentation fixes is
"Ready for committer", the second patch introducing the function
infrastructure should focus more on its core structure and should be
marked as "Returned with feedback". Opinions are welcome.
Regards,
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2015-12-16 Thread Alvaro Herrera

Michael Paquier wrote:
> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas  wrote:
> > On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier 
> >  wrote:
> >> I have looked for now at the first patch and finished with the
> >> attached while looking at it. Perhaps a committer could look already
> >> at that?
> >
> > It looks fine to me except that I think we should spell out "param" as
> > "parameter" throughout, instead of abbreviating.
> 
> Fine for me. I have updated the first patch as attached (still looking
> at the second).

hm, so this is to backpatch, not merely for master, yes?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2015-12-16 Thread Michael Paquier

On Thu, Dec 17, 2015 at 1:54 PM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas  wrote:
>> > On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier 
>> >  wrote:
>> >> I have looked for now at the first patch and finished with the
>> >> attached while looking at it. Perhaps a committer could look already
>> >> at that?
>> >
>> > It looks fine to me except that I think we should spell out "param" as
>> > "parameter" throughout, instead of abbreviating.
>>
>> Fine for me. I have updated the first patch as attached (still looking
>> at the second).
>
> hm, so this is to backpatch, not merely for master, yes?

I haven't thought about that as it is a cosmetic patch.. But yes
that's harmless to backpatch to 9.5, and it would actually be good to
get a consistent code base with master I guess.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving replay of XLOG_BTREE_VACUUM records

2015-12-16 Thread Michael Paquier

On Thu, Sep 3, 2015 at 6:10 AM, Vladimir Borodin wrote:
> Patch with implementation attached.
> Sorry for delay, I will link it to the current commitfest.

This patch looks correct, and is doing what it claims it does. It is
also not really worth refactoring the bit of code in PrefetchBuffer
that does a similar operation, so I am marking it as ready for
committer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2015-12-16 Thread Simon Riggs

On 17 December 2015 at 00:17, Tomas Vondra 
wrote:


> 1) nodeHashjoin.c (and other join nodes)
>>
>> I've noticed we have this in the ExecHashJoin() method:
>>
>>   /*
>>* When the inner side is unique or we're performing a
>>* semijoin, we'll consider returning the first match, but
>>* after that we're done with this outer tuple.
>>*/
>>   if (node->js.unique_inner)
>>   node->hj_JoinState = HJ_NEED_NEW_OUTER;
>>
>> That seems a bit awkward because the comment speaks about unique
>> joins *OR* semijoins, but the check only references unique_inner.
>> That of course works because we do this in ExecInitHashJoin():
>>
>>  switch (node->join.jointype)
>>  {
>>  case JOIN_SEMI:
>>  hjstate->js.unique_inner = true;
>>  /* fall through */
>>
>> Either some semijoins may not be unique - in that case the naming is
>> misleading at best, and we either need to find a better name or
>> simply check two conditions like this:
>>
>>   if (node->js.unique_inner || node->join.type == JOIN_SEMI)
>>  node->hj_JoinState = HJ_NEED_NEW_OUTER;
>>
>> Or all semijoins are unique joins, and in that case the comment may
>> need rephrasing. But more importantly, it begs the question why
>> we're detecting this in the executor and not in the planner? Because
>> if we detect it in executor, we can't use this bit in costing, for
>> example.
>>
>>
>> The reason not to detect that in the planner is simply that unique_join
>> is meant to mean that the relation on the inner side of the join will at
>> most contain a single Tuple which matches each outer relation tuple,
>> based on the join condition. I've added no detection for this in semi
>> joins, and I'd rather not go blindly setting the flag to true in the
>> planner as it simply may not be true for the semi join. At the moment
>> that might not matter as we're only using the unique_join flag as an
>> optimization in the join nodes, but I'd prefer not to do this as its
>> likely we'll want to do more with this flag later, and I'd rather we
>> keep the meaning well defined. You might argue that this is also true
>> for semi joins, but if down the road somewhere we want to perform some
>> checks on the inner relation before the join takes place, and in that
>> case the Tuples of the relation might not have the same properties we
>> claim they do.
>>
>> But you're right that reusing the flag in the join nodes is not ideal,
>> and the comment is not that great either. I'd really rather go down the
>> path of either renaming the variable, or explaining this better in the
>> comment. It seems unnecessary to check both for each tuple being joined.
>> I'd imagine that might add a little overhead to joins which are not
>> unique.
>>
>
> I'd be very surprised it that had any measurable impact.
>
> How about changing the comment to:
>>
>> /*
>> * In the event that the inner side has been detected to be
>> * unique, as an optimization we can skip searching for any
>> * subsequent matching inner tuples, as none will exist.
>> * For semijoins unique_inner will always be true, although
>> * in this case we don't match another inner tuple as this
>> * is the required semi join behavior.
>> */
>>
>> Alternatively or additionally we can rename the variable in the executor
>> state, although I've not thought of a name yet that I don't find overly
>> verbose: unique_inner_or_semi_join,  match_first_tuple_only.
>>
>
> I'd go with match_first_tuple_only.


+1

unique_inner is a state that has been detected, match_first_tuple_only is
the action we take as a result.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Freeze avoidance of very large table.

2015-12-16 Thread Michael Paquier

On Thu, Dec 17, 2015 at 3:44 PM, Simon Riggs  wrote:
> For me, rewriting the visibility map is a new data loss bug waiting to
> happen. I am worried that the group is not taking seriously the potential
> for catastrophe here.

FWIW, I'm following this line and merging the vm file into a single
unit looks like a ticking bomb. We may really want a separate _vm
file, like _vmf to track this new bit flag but this has already been
mentioned largely upthread...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2015-12-16 Thread Michael Paquier

On Thu, Dec 17, 2015 at 1:42 PM, Michael Paquier
 wrote:
> In conclusion, for this CF the patch doing the documentation fixes is
> "Ready for committer", the second patch introducing the function
> infrastructure should focus more on its core structure and should be
> marked as "Returned with feedback". Opinions are welcome.

I forgot to attach the second patch that I rebased using the 1st one
modified with the documentation changes.
-- 
Michael


0001-Make-pgbench-documentation-more-precise-for-function.patch
Description: binary/octet-stream


0002-Patch-introducing-functions-in-pgbench.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making tab-complete.c easier to maintain

2015-12-16 Thread Michael Paquier

On Thu, Dec 17, 2015 at 3:06 PM, Kyotaro HORIGUCHI
 wrote:
> At Mon, 14 Dec 2015 14:13:02 +0900, Michael Paquier 
>  wrote in 
> 
>> On Mon, Dec 14, 2015 at 8:10 AM, Thomas Munro 
>>  wrote:
>> Yeah, I guess that's an improvement for those cases, and there is no
>> immediate need for a per-keyword NOT operator in our cases to allow
>> things of the type (foo1 OR NOT foo2). Still, in the case of this
>> patch !foo1|foo2 actually means (NOT foo1 AND NOT foo2). It does not
>> seem that much intuitive. Reading straight this expression it seems
>> that !foo1|foo2 means actually (NOT foo1 OR foo2) because the lack of
>> parenthesis. Thoughts?
>
> I used just the same expression as Thomas in my patch since it
> was enough intuitive in this context in my view. The expression
> "(!FOO1)|FOO2" is a nonsence in the context of tab-completion and
> won't be used in future. But it is true that "!(FOO|BAR|BAZ)" is
> clearer than without the parenthesis.

Yeah that's my whole point. If we decide to support that I think that
we had better go all the way through it, with stuff like:
- & for AND operator
- Support of parenthesis to prioritize expressions
- ! for NOT operator
- | for OR OPERATOR
But honestly at this stage this would just benefit 5 places (citing
Thomas' quote), hence let's just go to the most simple approach which
is the one without all this fancy operator stuff... That will be less
bug prone, and still benefits more than 95% of the tab completion code
path. At least I think that's the most realistic thing to do if we
want to get something for this commit fest. If someone wants those
operators, well I guess that he could add them afterwards.. Thomas,
what do you think?

> Addition to that, I feel that successive "MatchAny"s are a bit
> bothersome.
>
>>  TailMatches6("COMMENT", "ON", MatchAny, MatchAny, MatchAny, MatchAny)) &&
>>  !TailMatches1("IS")
>
> Is MachAny acceptable? On concern is the two n's
> (TailMatches and MatchAny) looks a bit confising.
>
>>  TailMatches4("COMMENT", "ON", MatchAny3, "!IS")

Ah, OK, so you would want to be able to have an inner list, MatchAnyN
meaning actually a list of MatchAny items repeated N times. I am not
sure if that's worth it.. I would just keep it simple, and we are just
discussing about a couple of places only that would benefit from that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-12-16 Thread Simon Riggs

On 9 December 2015 at 18:31, Robert Haas  wrote:

> On Mon, Nov 30, 2015 at 12:58 PM, Bruce Momjian  wrote:
> > On Mon, Nov 30, 2015 at 10:48:04PM +0530, Masahiko Sawada wrote:
> >> On Sun, Nov 29, 2015 at 2:21 AM, Jeff Janes 
> wrote:
> >> > On Tue, Nov 24, 2015 at 3:13 PM, Masahiko Sawada <
> sawada.m...@gmail.com> wrote:
> >> >>
> >> >> Yeah, we need to consider to compute checksum if enabled.
> >> >> I've changed the patch, and attached.
> >> >> Please review it.
> >> >
> >> > Thanks for the update.  This now conflicts with the updates doesn to
> >> > fix pg_upgrade out-of-space issue on Windows. I've fixed (I think) the
> >> > conflict in order to do some testing, but I'd like to get an updated
> >> > patch from the author in case I did it wrong.  I don't want to find
> >> > bugs that I just introduced myself.
> >> >
> >>
> >> Thank you for having a look.
> >
> > I would not bother mentioning this detail in the pg_upgrade manual page:
> >
> > +   Since the format of visibility map has been changed in version 9.6,
> > +   pg_upgrade creates and rewrite new
> '_vm'
> > +   file even if upgrading from 9.5 or before to 9.6 or later with link
> mode (-k).
>
> Really?  I know we don't always document things like this, but it
> seems like a good idea to me that we do so.
>

Agreed.

For me, rewriting the visibility map is a new data loss bug waiting to
happen. I am worried that the group is not taking seriously the potential
for catastrophe here. I think we can do it, but I think it needs these
things

* Clear notice that it is happening unconditionally and unavoidably
* Log files showing it has happened, action by action
* Very clear mechanism for resolving an incomplete or interrupted upgrade
process. Which VMs got upgraded? Which didn't?
* Ability to undo an upgrade attempt, somehow, ideally automatically by
default
* Ability to restart a failed upgrade attempt without doing a "double
upgrade", i.e. ensure transformation is immutable

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Freeze avoidance of very large table.

2015-12-16 Thread Andres Freund

On 2015-12-17 15:56:35 +0900, Michael Paquier wrote:
> On Thu, Dec 17, 2015 at 3:44 PM, Simon Riggs  wrote:
> > For me, rewriting the visibility map is a new data loss bug waiting to
> > happen. I am worried that the group is not taking seriously the potential
> > for catastrophe here.
> 
> FWIW, I'm following this line and merging the vm file into a single
> unit looks like a ticking bomb.

And what are those risks?

> We may really want a separate _vm file, like _vmf to track this new
> bit flag but this has already been mentioned largely upthread...

That'd double the overhead when those bits get unset.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-12-16 Thread Michael Paquier

On Thu, Dec 17, 2015 at 4:10 PM, Andres Freund  wrote:
> On 2015-12-17 15:56:35 +0900, Michael Paquier wrote:
>> On Thu, Dec 17, 2015 at 3:44 PM, Simon Riggs  wrote:
>> > For me, rewriting the visibility map is a new data loss bug waiting to
>> > happen. I am worried that the group is not taking seriously the potential
>> > for catastrophe here.
>>
>> FWIW, I'm following this line and merging the vm file into a single
>> unit looks like a ticking bomb.
>
> And what are those risks?

Incorrect vm file rewrite after a pg_upgrade run.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

2015-12-16 Thread Haribabu Kommi

Rebased patch is attached as it is having an OID conflict with the
latest set of changes
in the master branch.

Regards,
Hari Babu
Fujitsu Australia


4_database_catalog_tenancy_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabling an index temporarily

2015-12-16 Thread Jim Nasby


On 12/16/15 12:15 AM, Jeff Janes wrote:

But also, while loading 1.5 million records into a table with 250
million records is horribly, rebuilding all the indexes on a 251.5
million record table from scratch is even more horrible.  I don't know
if suspending maintenance (either globally or just for one session)
and then doing a bulk fix-up would be less horrible, but would be
willing to give it a test run.


I would think that's something completely different though, no? If 
you're doing that wouldn't you want other inserting/updating backends to 
still maintain the index, and only do something special in the backend 
that's doing the bulk load? Otherwise the bulk load would have to wait 
for all running backends to finish to ensure that no one was using the 
index. That's ugly enough for CIC; I can't fathom it working in any 
normal batch processing.


(Doing a single bulk insert to the index at the end of an INSERT should 
be safe though because none of those tuples are visible yet, though I'd 
have to make sure your backend didn't try to use the index for anything 
while the command was running... like as part of a trigger.)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: custom compression methods

2015-12-16 Thread Michael Paquier

On Wed, Dec 16, 2015 at 10:17 PM, Andres Freund  wrote:
> Again. Unless I miss something that was solved in
> www.postgresql.org/message-id/flat/20130615102028.gk19...@alap2.anarazel.de
>
> Personally I think we should add lz4 and maybe on strongly compressing
> algorithm and be done with it. Runtime extensibility will make this much
> more complicated, and I doubt it's worth the complexity.

+1, at the end we are going to have the same conclusion as for FPW
compression when we discussed that, let's just switch to something
that is less CPU-consuming than pglz, and lz4 is a good candidate for
that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: custom compression methods

2015-12-16 Thread Jim Nasby


On 12/16/15 7:03 AM, Tomas Vondra wrote:


While versioning or increasing the 1GB limit are interesting, they have
nothing to do with this particular patch. (BTW I don't see how the
versioning would work at varlena level - that's something that needs to
be handled at data type level).


Right, but that's often going to be very hard to do and still support 
pg_upgrade. It's not like most types have spare bits laying around. 
Granted, this still means non-varlena types are screwed.



I think the only question we need to ask here is whether we want to
allow mixed compression for a column. If no, we're OK with the current
header. This is what the patch does, as it requires a rewrite after
changing the compression method.


I think that is related to the other items though: none of those other 
items (except maybe the 1G limit) seem to warrant dorking with varlena, 
but if there were 3 separate features that could make use of support for 
8 byte varlena then perhaps it's time to invest in that effort. 
Especially since IIRC we're currently out of bits, so if we wanted to 
change this we'd need to do it at least a release in advance so there 
was a version that would expand 4 byte varlena to 8 byte as needed.



And we're not painting ourselves in the corner - if we decide to
increase the varlena header size in the future, this patch does not make
it any more complicated.


True.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2015-12-16 Thread Tomas Vondra


Hi,

On 12/16/2015 11:40 PM, David Rowley wrote:

On 17 December 2015 at 05:02, Tomas Vondra > wrote:

0) I know the patch does not tweak costing - any plans in this

direction? Would it be possible to simply use the costing used by
semijoin?


Many thanks for looking at this.

The patch does tweak the costings so that unique joins are costed in the
same way as semi joins. It's a very small change, so easy to miss.


Thanks. I missed that bit somehow.



For example, see the change in initial_cost_nestloop()

-   if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+   if (jointype == JOIN_SEMI ||
+   jointype == JOIN_ANTI ||
+   unique_inner)
 {

Also see the changes in final_cost_nestloop() and final_cost_hashjoin()


1) nodeHashjoin.c (and other join nodes)

I've noticed we have this in the ExecHashJoin() method:

  /*
   * When the inner side is unique or we're performing a
   * semijoin, we'll consider returning the first match, but
   * after that we're done with this outer tuple.
   */
  if (node->js.unique_inner)
  node->hj_JoinState = HJ_NEED_NEW_OUTER;

That seems a bit awkward because the comment speaks about unique
joins *OR* semijoins, but the check only references unique_inner.
That of course works because we do this in ExecInitHashJoin():

 switch (node->join.jointype)
 {
 case JOIN_SEMI:
 hjstate->js.unique_inner = true;
 /* fall through */

Either some semijoins may not be unique - in that case the naming is
misleading at best, and we either need to find a better name or
simply check two conditions like this:

  if (node->js.unique_inner || node->join.type == JOIN_SEMI)
 node->hj_JoinState = HJ_NEED_NEW_OUTER;

Or all semijoins are unique joins, and in that case the comment may
need rephrasing. But more importantly, it begs the question why
we're detecting this in the executor and not in the planner? Because
if we detect it in executor, we can't use this bit in costing, for
example.


The reason not to detect that in the planner is simply that unique_join
is meant to mean that the relation on the inner side of the join will at
most contain a single Tuple which matches each outer relation tuple,
based on the join condition. I've added no detection for this in semi
joins, and I'd rather not go blindly setting the flag to true in the
planner as it simply may not be true for the semi join. At the moment
that might not matter as we're only using the unique_join flag as an
optimization in the join nodes, but I'd prefer not to do this as its
likely we'll want to do more with this flag later, and I'd rather we
keep the meaning well defined. You might argue that this is also true
for semi joins, but if down the road somewhere we want to perform some
checks on the inner relation before the join takes place, and in that
case the Tuples of the relation might not have the same properties we
claim they do.

But you're right that reusing the flag in the join nodes is not ideal,
and the comment is not that great either. I'd really rather go down the
path of either renaming the variable, or explaining this better in the
comment. It seems unnecessary to check both for each tuple being joined.
I'd imagine that might add a little overhead to joins which are not unique.


I'd be very surprised it that had any measurable impact.


How about changing the comment to:

/*
* In the event that the inner side has been detected to be
* unique, as an optimization we can skip searching for any
* subsequent matching inner tuples, as none will exist.
* For semijoins unique_inner will always be true, although
* in this case we don't match another inner tuple as this
* is the required semi join behavior.
*/

Alternatively or additionally we can rename the variable in the executor
state, although I've not thought of a name yet that I don't find overly
verbose: unique_inner_or_semi_join,  match_first_tuple_only.


I'd go with match_first_tuple_only.




2) analyzejoins.c

I see that this code in join_is_removable()

 innerrel = find_base_rel(root, innerrelid);

 if (innerrel->reloptkind != RELOPT_BASEREL)
 return false;

was rewritten like this:

 innerrel = find_base_rel(root, innerrelid);

 Assert(innerrel->reloptkind == RELOPT_BASEREL);

That suggests that previously the function expected cases where
reloptkind may not be RELOPT_BASEREL, but now it'll error out int
such cases. I haven't noticed any changes in the surrounding code
that'd guarantee this won't happen, but I also haven't been able to
come up with an example triggering the assert (haven't been trying
too hard). How do we know the assert() makes sense?


I'd have

Re: [HACKERS] [PATCH] Logical decoding support for sequence advances

2015-12-16 Thread Craig Ringer

On 15 December 2015 at 20:17, Andres Freund  wrote:
>
>
> I think this is quite the wrong approach. You're calling the logical
> decoding callback directly from decode.c, circumventing
> reorderbuffer.c. While you don't need the actual reordering, you *do*
> need the snapshot integration.

Yeah. I thought I could get away without it because I could use
Form_pg_sequence.sequence_name to bypass any catalog lookups at all, but
per upthread that field should be ignored and can't be used.

> As you since noticed you can't just
> look into the sequence tuple and be happy with that, you need to do
> pg_class lookups et al. Furhtermore callbacks can validly expect to have
> a snapshot set.
>

Good point. Just because I don't need one doesn't mean others won't, and
all the other callbacks do.

I'll have a read over reorderbuffer.c and see if I can integrate this
properly.

> At the very least what you need to do is to SetupHistoricSnapshot(),
> then lookup the actual identity of the sequence using
> RelidByRelfilenode() and only then you can call the callback.
>

Yeah. That means it's safe for the callback to do a syscache lookup for the
sequence name by oid.

> > Needed to make logical replication based failover possible.
>
> While it'll make it easier, I think it's certainly quite possible to do
> so without this feature if you accept some sane restrictions. If you
> e.g. define to only handle serial columns (i.e. sequences that
> used/owned by exactly one table & column), you can do a 'good enough'
> emulation the output plugin.
>
> Take something like BDR's bdr_relcache.c (something which you're going
> to end up needing for any nontrivial replication solution). Everytime
> you build a cache entry you look up whether there's an owned by
> sequence.  When decoding an insert or update to that relation, also emit
> an 'increase this sequence to at least XXX' message.
>

Eh. I don't think it's good enough for failover really. Too much of a
foot-gun risk for my taste. We've both seen the weird things users do and
the way nobody reads the manual or understands the limitations (global DDL
lock anybody?) and I really want to make things work right by default with
minimal caveats.

I'm not against working around it, and we'll need to for 9.4/9.5, but I'm
trying to get something more solid into 9.6.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Remove array_nulls?

2015-12-16 Thread Jim Nasby


On 12/16/15 6:01 PM, Robert Haas wrote:

On Tue, Dec 15, 2015 at 1:26 AM, Michael Paquier
 wrote:

On Tue, Dec 15, 2015 at 2:57 AM, Jim Nasby  wrote:

On 12/11/15 2:57 PM, Tom Lane wrote:

Jim Nasby  writes:
Perhaps, but I'd like to have a less ad-hoc process about it.  What's
our policy for dropping backwards-compatibility GUCs?  Are there any
others that should be removed now as well?



Perhaps it should be tied to bumping the major version number, which I'm
guessing would happen next whenever we get parallel query execution. If we
do that, a reasonable policy might be that a compatability GUC lives across
no more than 1 major version bump (ie, we wouldn't remove something in 9.0
that was added in 8.4).


Another possibility may be to link that with the 5-year maintenance
window of community: a compatibility GUC is dropped in the following
major release if the oldest stable version maintained is the one that
introduced it. Just an idea.


Yeah, there's something to be said for that, although to be honest in
most cases I'd prefer to wait longer.   I wonder about perhaps
planning to drop things after two lifecycles.  That is, when the
release where the incompatibility was added goes out of support, we
don't do anything, but when the release that was current when it went
out of support is itself out of support, then we drop the GUC.  For
example, 8.2 went EOL in December 2011, at which point the newest
release was 9.1.  So when 9.1 is out of support, then we could drop
it; that's due to happen this September.  So 9.6 (or 10.0, if we call
it that) could drop it.


IIUC, that means supporting backwards compat. GUCs for 10 years, which 
seems a bit excessive. Granted, that's about the worse-case scenario for 
what I proposed (ie, we'd still be supporting 8.0 stuff right now).


The reason I thought of tying it to "major major" release is just 
because those generate even more notice and attract more users than 
normal, so it'd be nice to clean house before doing one. Perhaps I'm 
just introducing complexity that there's no need for.


If we don't want to tie "major major" number, then I think we should 
just go with the normal support cycle.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2015-12-16 Thread Peter Geoghegan

On Wed, Dec 16, 2015 at 11:44 PM, Peter Geoghegan  wrote:
>> In any case, at this point 9.5 is really aimed to be stabilized, so
>> targeting only master is a far saner approach IMO for this patch.
>> Pushing that in 9.5 a couple of months back may have given enough
>> reason to do so... But well life is life.
>
> No, this really isn't an optimization at all.

I should add: I think that the chances of this patch destabilizing the
code are very slim, once it receives the proper review. Certainly, I
foresee no possible downside to not inserting the doomed IndexTuple,
since it's guaranteed to have its heap tuple super-deleted immediately
afterwards.

That's the only real behavioral change proposed here. So, I would
prefer it if we got this in before the first stable release of 9.5.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_tables bug?

2015-12-16 Thread Gaetano Mendola

I'm playing around with tablespace (postgresq 9.4) and I found out what I
believe is a bug in pg_tables.
Basically if you create a database in a table space X and then you create a
table on the database the table is created correctly on the tablespace X (
I did a check on the filesystem) however if you do a select on pg_tables
the column tablespace for that table is empty and even worst if you dump
the DB there is no reporting about the the database or table being on that
tablespace.
Even \d doesn't report that the table is in the tablespace X.

Regards

Re: [HACKERS] Freeze avoidance of very large table.

2015-12-16 Thread Andres Freund

On 2015-12-17 16:22:24 +0900, Michael Paquier wrote:
> On Thu, Dec 17, 2015 at 4:10 PM, Andres Freund  wrote:
> > On 2015-12-17 15:56:35 +0900, Michael Paquier wrote:
> >> On Thu, Dec 17, 2015 at 3:44 PM, Simon Riggs  wrote:
> >> > For me, rewriting the visibility map is a new data loss bug waiting to
> >> > happen. I am worried that the group is not taking seriously the potential
> >> > for catastrophe here.
> >>
> >> FWIW, I'm following this line and merging the vm file into a single
> >> unit looks like a ticking bomb.
> >
> > And what are those risks?
> 
> Incorrect vm file rewrite after a pg_upgrade run.

If we can't manage to rewrite a file, replacing a binary b1 with a b10,
then we shouldn't be working on a database. And if we screw up, recovery
i is an rm *_vm away. I can't imagine that this is going to be the
actually complicated part of this feature.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2015-12-16 Thread Peter Geoghegan

On Wed, Dec 16, 2015 at 11:20 PM, Michael Paquier
 wrote:
> (A couple of months later)
> This is not an actual fix, but an optimization, no?
> UNIQUE_CHECK_SPECULATIVE is just used to optimize a couple of code
> paths in the case of a insert conflicting during btree insertion..
>
> In any case, at this point 9.5 is really aimed to be stabilized, so
> targeting only master is a far saner approach IMO for this patch.
> Pushing that in 9.5 a couple of months back may have given enough
> reason to do so... But well life is life.

No, this really isn't an optimization at all. It has nothing to do
with performance, since I think that unnecessarily inserting an index
tuple in practice has very little or no appreciable overhead.

The point of avoiding that is that it makes no sense, while doing it
implies that it does make sense. (i.e. It does not make sense to
insert into a B-Tree leaf page an IndexTuple pointing to a speculative
heap tuple with the certain knowledge that we'll have to super-delete
the speculative heap tuple in the end anyway).

This is 100% about clarifying the intent and design of the code.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2015-12-16 Thread David Rowley

On 16 December 2015 at 18:11, Haribabu Kommi 
wrote:

> On Tue, Dec 15, 2015 at 8:04 AM, Paul Ramsey 
> wrote:
> > But the run dies.
> >
> > NOTICE:  SRID value -32897 converted to the officially unknown SRID
> value 0
> > ERROR:  Unknown geometry type: 2139062143 - Invalid type
> >
> > From the message it looks like geometry gets corrupted at some point,
> > causing a read to fail on very screwed up metadata.
>
> Thanks for the test. There was some problem in advance_combination_function
> in handling pass by reference data. Here I attached updated patch with the
> fix.

Thanks for fixing this.

I've been playing around with this patch and looking at the code, and I
just have a few general questions that I'd like to ask to see if there's
any good answers for them yet.

One thing I noticed is that you're only enabling Parallel aggregation when
there's already a Gather node in the plan. Perhaps this is fine for a proof
of concept, but I'm wondering how we can move forward from this to
something that can be committed. As of how the patch is today, it means
that you don't get a parallel agg plan for;

Query 1: select count(*) from mybigtable;

but you do for;

Query 2: select count(*) from mybigtable where ;

since the  allows parallel seq scan to win
over a seq scan as it does not have as much cost to moving tuples from the
worker process into the main process. If that Gather node is found the code
shuffles the plan around so that the partial agg node is below it and
sticks a Finalize Agg node below the Gather. I imagine you wrote this with
the intention of finding something better later, once we see that it all
can work, and cool, it seems to work!

I'm not quite sure what the general solution is to improve on this is this
yet, as it's not really that great if we can't get parallel aggregation on
Query 1, but we can on Query 2.

In master today we seem to aim to parallelise at the path level, which
seems fine if we only aim to have SeqScan as the only parallel enabled
node, but once we have several other nodes parallel enabled, we might, for
instance, want to start parallelising whole plan tree branches, if all
nodes in that branch happen to support parallelism.

I'm calling what we have today "keyhole parallelism", because we enable
parallelism while looking at a tiny part of the picture. I get the
impression that the bigger picture has been overlooked as perhaps it's more
complex and keyhole parallelism at least allows us to get something in
that's parallelised, but this patch indicates to me that we're already
hitting the limits of that, should we rethink? I'm concerned as I've come
to learn that changing is sort of thing after a release is much harder as
people start to find cases where performance regresses which makes it much
more difficult to improve things.

Also my apologies if I've missed some key conversation about how all of the
above is intended to work. Please feel free to kick me into line if that's
the case.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Andreas Karlsson


On 12/14/2015 01:57 PM, Merlin Moncure wrote:

This may be moot; some testing demonstrated that libedit was not
impacted so it really comes down to having the right readline api call
available.  Looking at the code ISTM that libedit resets the terminal
on every prompt.


Did you manage to figure out why one was better than the other? The 
differences between the functions seem rather subtle.


rl_reset_screen_size()

Does not respect the COLUMNS and ROWS environment variables.

_rl_sigwinch_resize_terminal()

Internal callback, does the same thing as
rl_reset_screen_size() except that it also respects COLUMNS and ROWS.

rl_resize_terminal()

Respects COLUMNS and ROWS and also has some logic when echo mode is 
turned on which I have not managed to understand yet.


Btw, really nice to see someone working at this. It has bugged me a long 
time.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_replication log positions vs base backups

2015-12-16 Thread Michael Paquier

On Wed, Dec 16, 2015 at 7:14 PM, Magnus Hagander  wrote:
> On Wed, Dec 16, 2015 at 8:34 AM, Michael Paquier 
> wrote:
>> Interesting. I got just today a bug report that is actually a symptom
>> that people should be careful about: it is possible that
>> pg_stat_replication reports 1/potential for sync_priority/sync_state
>> in the case of a WAL sender in "backup" state: a base backup just
>> needs to reuse the shared memory slot of a standby that was previously
>> connected. Commit 61c7bee of Magnus fixes the issue, just let's be
>> careful if there are similar reports that do not include this fix.
>
>
> Hmm. With the fix, it returns "async", right?

Yes, it returns async with priority set at 0 after your commit. That's
a bit better than the old behavior, still..

> Perhaps it should return either "backup" or NULL, to be even more clear? And
> with priority set to NULL?

I'd vote for just NULL for both fields. This async/sync status has no
real sense for a backup. Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cube extension kNN support

2015-12-16 Thread Tomas Vondra


Hi,

On 12/16/2015 01:26 PM, Stas Kelvich wrote:

Hi, thanks for the review.


1) (nitpicking) There seem to be some minor whitespace issues, i.e.
   trailing spaces, empty lines being added/removed, etc.



Fixed, I think


2) one of the regression tests started to fail

   SELECT '-1e-700'::cube AS cube;

   This used to return (0) but now I get (-0).


Actually that problem emerged because of the first problem. I had

extra whitespace in sql file and removed that whitespace from one of the
answers file (cube_1.sql), so diff with both cube.sql and cube_1.sql was
one line length and you saw diff with cube.sql.

In all systems that available to me (osx/linux/freebsd) I saw that

right answers file is cube_1.sql. But in other OS’es you can get +/- 0
or e27/e027. I edited that answers files manually, so there probably can
be some other typos.

Ah! So that's why I couldn't quickly find the issue in the C code ...




3) I wonder why the docs were changed like this:

   
-   It does not matter which order the opposite corners of a cube are
-   entered in.  The cube functions
-   automatically swap values if needed to create a uniform
-   lower left  upper right internal representation.
+   When corners coincide cube stores only one corner along with a
special flag in order to reduce size wasted.
   

   Was the old behavior removed? I don't think so - it seems to behave
   as before, so why to remove this information? Maybe it's not useful?
   But then why add the bit about optimizing storage of points?


I’ve edited it because the statement was mislead (or at least ambiguous) — 
cube_in function doesn’t swap coordinates.
Simple way to see it:

select '(1,3),(3,1)'::cube;

  cube
---
  (1, 3),(3, 1)

But LowerLeft-UpperRight representation should be (1,1),(3,3)


I don't think that's what the comment says, actually. It rather refers 
to code like this:


result = Min(LL_COORD(c, n - 1), UR_COORD(c, n - 1));

i.e. if you specifically ask for a particular corner (ll, in this case), 
you'll get the proper value.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: custom compression methods

2015-12-16 Thread Tomas Vondra


Hi,

On 12/14/2015 12:51 PM, Simon Riggs wrote:

On 13 December 2015 at 17:28, Alexander Korotkov
> wrote:

it would be nice to make compression methods pluggable.


Agreed.

My thinking is that this should be combined with work to make use of
the compressed data, which is why Alvaro, Tomas, David have been
working on Col Store API for about 18 months and work on that
continues with more submissions for 9.6 due.


I'm not sure it makes sense to combine those two uses of compression, 
because there are various differences - some subtle, some less subtle. 
It's a bit difficult to discuss this without any column store 
background, but I'll try anyway.


The compression methods discussed in this thread, used to compress a 
single varlena value, are "general-purpose" in the sense that they 
operate on opaque stream of bytes, without any additional context (e.g. 
about structure of the data being compressed). So essentially the 
methods have an API like this:


  int   compress(char *src, int srclen, char *dst, int dstlen);
  int decompress(char *src, int srclen, char *dst, int dstlen);

And possibly some auxiliary methods like "estimate compressed length" 
and such.


OTOH the compression methods we're messing with while working on the 
column store are quite different - they operate on columns (i.e. "arrays 
of Datums"). Also, column stores prefer "light-weight" compression 
methods like RLE or DICT (dictionary compression) because those methods 
allow execution on compressed data when done properly. Which for example 
requires additional info about the data type in the column, so that the 
RLE groups match the data type length.


So the API of those methods looks quite different, compared to the 
general-purpose methods. Not only the compression/decompression methods 
will have additional parameters with info about the data type, but 
there'll be methods used for iterating over values in the compressed 
data etc.


Of course, it'd be nice to have the ability to add/remove even those 
light-weight methods, but I'm not sure it makes sense to squash them 
into the same catalog. I can imagine a catalog suitable for both APIs 
(essentially having two groups of columns, one for each type of 
compression algorithm), but I can't really imagine a compression method 
providing both interfaces at the same time.


In any case, I don't think this is the main challenge the patch needs to 
solve at this point.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()

2015-12-16 Thread Robert Haas

On Wed, Dec 16, 2015 at 3:34 AM, amul sul  wrote:
> Updated patch to add this table creation case in regression tests.
> PFA patch version V3.

I committed the previous version just now after fixing various things.
In particular, you added a function called from one place that took a
Boolean argument that always had the same value.  You've got to either
call it from more than one place, or remove the argument.  I picked
the former.  Also, I added a test case and made a few other tweaks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PoC] Asynchronous execution again (which is not parallel)

2015-12-16 Thread Amit Kapila

On Wed, Dec 16, 2015 at 6:04 PM, Robert Haas  wrote:
>
> On Wed, Dec 16, 2015 at 1:34 AM, Amit Kapila 
wrote:
> > Yes, thats one thing I wanted to know, yet another point which is not
> > clear to me about this Async infrastructure is why the current
> > infrastructure
> > of Parallelism can't be used to achieve the Async benefits of
ForeignScan?
>
> Well, all a ForeignScan by postgres_fdw does is read the tuples that
> are generated remotely.  Turning around and sticking those into a
> Funnel doesn't seem like it gains much: now instead of having to read
> tuples from someplace, the leader has to read tuples from some other
> place.  Yeah, there are cases where it could win, like when there's a
> selective nonpushable qual, but that's not that exciting.
>
> There's another, more serious problem: if the leader has a connection
> open to the remote server and that connection is in mid-transaction,
> you can't have a worker open a new connection without changing the
> semantics.  Working around that problem looks hard to me.
>

Okay.  It seems there are cases where it could benefit from Async
execution infrastructure instead of directly using Gather node kind
of infrastructure.  I am not the right person to judge whether there
are enough cases that we need a new infrastructure for such executions,
but I think it is a point to consider and I am sure you will make the
right move.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Proposal: custom compression methods

2015-12-16 Thread Andres Freund

On 2015-12-16 14:14:36 +0100, Tomas Vondra wrote:
> I think the main obstacle to make this possible is the lack of free space in
> varlena header / need to add the ID of the compression method into the
> value.
> 
> FWIW I'd like to allow this (mixing compression types), but I don't think
> it's worth the complexity at this point. We can add that later, if it turns
> out to be a problem in practice (which it probably won't).

Again. Unless I miss something that was solved in
www.postgresql.org/message-id/flat/20130615102028.gk19...@alap2.anarazel.de

Personally I think we should add lz4 and maybe on strongly compressing
algorithm and be done with it. Runtime extensibility will make this much
more complicated, and I doubt it's worth the complexity.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Potential pointer dereference in plperl.c (caused by transforms patch)

2015-12-16 Thread Alvaro Herrera

Noah Misch wrote:

> fcinfo->flinfo->fn_oid==InvalidOid implies an inline block, and those have no
> arguments.  If it placates Coverity, I lean toward an assert-only change:
> 
> --- a/src/pl/plperl/plperl.c
> +++ b/src/pl/plperl/plperl.c

This was committed as d4b686af0b.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cube extension kNN support

2015-12-16 Thread Stas Kelvich

> I don't think that's what the comment says, actually. It rather refers to 
> code like this:
> 
>result = Min(LL_COORD(c, n - 1), UR_COORD(c, n - 1));
> 
> i.e. if you specifically ask for a particular corner (ll, in this case), 
> you'll get the proper value.

Hmm, I was confused by phrase “create a uniform _internal_ representation” and 
actually internally cube stored “as is”. But probably i just misinterpret that.
So here is the updated version with old documentation restored.



cube_distances.patch
Description: Binary data



> On 16 Dec 2015, at 16:46, Tomas Vondra  wrote:
> 
> Hi,
> 
> On 12/16/2015 01:26 PM, Stas Kelvich wrote:
>> Hi, thanks for the review.
>> 
>>> 1) (nitpicking) There seem to be some minor whitespace issues, i.e.
>>>   trailing spaces, empty lines being added/removed, etc.
>> 
>> 
>> Fixed, I think
>> 
>>> 2) one of the regression tests started to fail
>>> 
>>>   SELECT '-1e-700'::cube AS cube;
>>> 
>>>   This used to return (0) but now I get (-0).
>> 
>> Actually that problem emerged because of the first problem. I had
> extra whitespace in sql file and removed that whitespace from one of the
> answers file (cube_1.sql), so diff with both cube.sql and cube_1.sql was
> one line length and you saw diff with cube.sql.
>> In all systems that available to me (osx/linux/freebsd) I saw that
> right answers file is cube_1.sql. But in other OS’es you can get +/- 0
> or e27/e027. I edited that answers files manually, so there probably can
> be some other typos.
> 
> Ah! So that's why I couldn't quickly find the issue in the C code ...
> 
>> 
>>> 3) I wonder why the docs were changed like this:
>>> 
>>>   
>>> -   It does not matter which order the opposite corners of a cube are
>>> -   entered in.  The cube functions
>>> -   automatically swap values if needed to create a uniform
>>> -   lower left  upper right internal representation.
>>> +   When corners coincide cube stores only one corner along with a
>>>special flag in order to reduce size wasted.
>>>   
>>> 
>>>   Was the old behavior removed? I don't think so - it seems to behave
>>>   as before, so why to remove this information? Maybe it's not useful?
>>>   But then why add the bit about optimizing storage of points?
>> 
>> I’ve edited it because the statement was mislead (or at least ambiguous) — 
>> cube_in function doesn’t swap coordinates.
>> Simple way to see it:
>>> select '(1,3),(3,1)'::cube;
>>  cube
>> ---
>>  (1, 3),(3, 1)
>> 
>> But LowerLeft-UpperRight representation should be (1,1),(3,3)
> 
> I don't think that's what the comment says, actually. It rather refers to 
> code like this:
> 
>result = Min(LL_COORD(c, n - 1), UR_COORD(c, n - 1));
> 
> i.e. if you specifically ask for a particular corner (ll, in this case), 
> you'll get the proper value.
> 
> regards
> 
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-12-16 Thread Fabien COELHO



Here is a two part v12, which:

part a (refactoring of scripts and their stats):
 - fix option checks (-i alone)
 - s/repleacable/replaceable/ in doc
 - keep small description in doc and help for -S & -N
 - fix 2 comments for pg style
 - show builtin list if not found

part b (weight)
 - check that the weight is an int

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0ac40f1..62ce496 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -261,6 +261,23 @@ pgbench  options  dbname
 benchmarking arguments:
 
 
+ 
+  -b scriptname
+  --builtin scriptname
+  
+   
+Add the specified builtin script to the list of executed scripts.
+Available builtin scripts are: tpcb-like,
+simple-update and select-only.
+The provided scriptname needs only to be a prefix
+of the builtin name, hence simp would be enough to select
+simple-update.
+With special name list, show the list of builtin scripts
+and exit immediately.
+   
+  
+ 
+
 
  
   -c clients
@@ -307,14 +324,13 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename
+  --file=filename
   

-Read transaction script from filename.
+Add a transaction script read from filename to
+the list of executed scripts.
 See below for details.
--N, -S, and -f
-are mutually exclusive.

   
  
@@ -404,10 +420,8 @@ pgbench  options  dbname
   --skip-some-updates
   

-Do not update pgbench_tellers and
-pgbench_branches.
-This will avoid update contention on these tables, but
-it makes the test case even less like TPC-B.
+Run builtin simple-update script.
+Shorthand for -b simple-update.

   
  
@@ -512,9 +526,9 @@ pgbench  options  dbname
 Report the specified scale factor in pgbench's
 output.  With the built-in tests, this is not necessary; the
 correct scale factor will be detected by counting the number of
-rows in the pgbench_branches table.  However, when testing
-custom benchmarks (-f option), the scale factor
-will be reported as 1 unless this option is used.
+rows in the pgbench_branches table.
+However, when testing only custom benchmarks (-f option),
+the scale factor will be reported as 1 unless this option is used.

   
  
@@ -524,7 +538,8 @@ pgbench  options  dbname
   --select-only
   

-Perform select-only transactions instead of TPC-B-like test.
+Run built-in select-only script.
+Shorthand for -b select-only.

   
  
@@ -674,7 +689,17 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   The default transaction script issues seven commands per transaction:
+   Pgbench executes test scripts chosen randomly from a specified list.
+   They include built-in scripts with -b and
+   user-provided custom scripts with -f.
+ 
+
+  
+   The default builtin transaction script (also invoked with -b tpcb-like)
+   issues seven commands per transaction over randomly chosen aid,
+   tid, bid and balance.
+   The scenario is inspired by the TPC-B benchmark, but is not actually TPC-B,
+   hence the name.
   
 
   
@@ -688,9 +713,15 @@ pgbench  options  dbname
   
 
   
-   If you specify -N, steps 4 and 5 aren't included in the
-   transaction.  If you specify -S, only the SELECT is
-   issued.
+   If you select the simple-update builtin (also -N),
+   steps 4 and 5 aren't included in the transaction.
+   This will avoid update contention on these tables, but
+   it makes the test case even less like TPC-B.
+  
+
+  
+   If you select the select-only builtin (also -S),
+   only the SELECT is issued.
   
  
 
@@ -702,10 +733,7 @@ pgbench  options  dbname
benchmark scenarios by replacing the default transaction script
(described above) with a transaction script read from a file
(-f option).  In this case a transaction
-   counts as one execution of a script file.  You can even specify
-   multiple scripts (multiple -f options), in which
-   case a random one of the scripts is chosen each time a client session
-   starts a new transaction.
+   counts as one execution of a script file.
   
 
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index f2d435b..1e0c7cc 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -164,12 +164,11 @@ bool		use_log;			/* log transaction latencies to a file */
 bool		use_quiet;			/* quiet logging onto stderr */
 int			agg_interval;		/* log aggregates instead of individual
  * transactions */
+boolper_script_stats = false; /* whether to collect stats per script */
 int			progress = 0;		/* thread progress report every

Re: [HACKERS] Cube extension kNN support

2015-12-16 Thread Stas Kelvich

Hi, thanks for the review.

> 1) (nitpicking) There seem to be some minor whitespace issues, i.e.
>   trailing spaces, empty lines being added/removed, etc.


Fixed, I think

> 2) one of the regression tests started to fail
> 
>   SELECT '-1e-700'::cube AS cube;
> 
>   This used to return (0) but now I get (-0).

Actually that problem emerged because of the first problem. I had extra 
whitespace in sql file and removed that whitespace from one of the answers file 
(cube_1.sql), so diff with both cube.sql and cube_1.sql was one line length and 
you saw diff with cube.sql. 
In all systems that available to me (osx/linux/freebsd) I saw that right 
answers file is cube_1.sql. But in other OS’es you can get +/- 0 or e27/e027. I 
edited that answers files manually, so there probably can be some other typos.

> 3) I wonder why the docs were changed like this:
> 
>   
> -   It does not matter which order the opposite corners of a cube are
> -   entered in.  The cube functions
> -   automatically swap values if needed to create a uniform
> -   lower left  upper right internal representation.
> +   When corners coincide cube stores only one corner along with a
>special flag in order to reduce size wasted.
>   
> 
>   Was the old behavior removed? I don't think so - it seems to behave
>   as before, so why to remove this information? Maybe it's not useful?
>   But then why add the bit about optimizing storage of points?

I’ve edited it because the statement was mislead (or at least ambiguous) — 
cube_in function doesn’t swap coordinates. 
Simple way to see it:
> select '(1,3),(3,1)'::cube;
 cube  
---
 (1, 3),(3, 1)

But LowerLeft-UpperRight representation should be (1,1),(3,3)


Updated patch attached.




cube_distances.patch
Description: Binary data



> On 15 Dec 2015, at 21:46, Tomas Vondra  wrote:
> 
> Hi,
> 
> On 12/07/2015 03:47 PM, Stas Kelvich wrote:
>> Hello, fixed.
> 
> I've looked at the patch today, seems mostly fine to me.
> 
> Three comments though:
> 
> 1) (nitpicking) There seem to be some minor whitespace issues, i.e.
>   trailing spaces, empty lines being added/removed, etc.
> 
> 2) one of the regression tests started to fail
> 
>   SELECT '-1e-700'::cube AS cube;
> 
>   This used to return (0) but now I get (-0). As this query existed in
>   1.0, it's probably due to change in the C code. Now sure where.
> 
> 3) I wonder why the docs were changed like this:
> 
>   
> -   It does not matter which order the opposite corners of a cube are
> -   entered in.  The cube functions
> -   automatically swap values if needed to create a uniform
> -   lower left  upper right internal representation.
> +   When corners coincide cube stores only one corner along with a
>special flag in order to reduce size wasted.
>   
> 
>   Was the old behavior removed? I don't think so - it seems to behave
>   as before, so why to remove this information? Maybe it's not useful?
>   But then why add the bit about optimizing storage of points?
> 
> regards
> 
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PoC] Asynchronous execution again (which is not parallel)

2015-12-16 Thread Robert Haas

On Wed, Dec 16, 2015 at 1:34 AM, Amit Kapila  wrote:
> Yes, thats one thing I wanted to know, yet another point which is not
> clear to me about this Async infrastructure is why the current
> infrastructure
> of Parallelism can't be used to achieve the Async benefits of ForeignScan?

Well, all a ForeignScan by postgres_fdw does is read the tuples that
are generated remotely.  Turning around and sticking those into a
Funnel doesn't seem like it gains much: now instead of having to read
tuples from someplace, the leader has to read tuples from some other
place.  Yeah, there are cases where it could win, like when there's a
selective nonpushable qual, but that's not that exciting.

There's another, more serious problem: if the leader has a connection
open to the remote server and that connection is in mid-transaction,
you can't have a worker open a new connection without changing the
semantics.  Working around that problem looks hard to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: custom compression methods

2015-12-16 Thread Tomas Vondra




On 12/14/2015 07:28 PM, Jim Nasby wrote:

On 12/14/15 12:50 AM, Craig Ringer wrote:

The issue with per-Datum is that TOAST claims two bits of a varlena
header, which already limits us to 1 GiB varlena values, something
people are starting to find to be a problem. There's no wiggle room to
steal more bits. If you want pluggable compression you need a way to
store knowledge of how a given datum is compressed with the datum or
have a fast, efficient way to check.


... unless we allowed for 8 byte varlena headers. Compression changes
themselves certainly don't warrant that, but if people are already
unhappy with 1GB TOAST then maybe that's enough.

The other thing this might buy us are a few bits that could be used to
support Datum versioning for other purposes, such as when the binary
format of something changes. I would think that at some point we'll need
that for pg_upgrade.


While versioning or increasing the 1GB limit are interesting, they have 
nothing to do with this particular patch. (BTW I don't see how the 
versioning would work at varlena level - that's something that needs to 
be handled at data type level).


I think the only question we need to ask here is whether we want to 
allow mixed compression for a column. If no, we're OK with the current 
header. This is what the patch does, as it requires a rewrite after 
changing the compression method.


And we're not painting ourselves in the corner - if we decide to 
increase the varlena header size in the future, this patch does not make 
it any more complicated.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel joins, and better parallel explain

2015-12-16 Thread Amit Kapila

On Tue, Dec 15, 2015 at 7:31 PM, Robert Haas  wrote:
>
> On Mon, Dec 14, 2015 at 8:38 AM, Amit Kapila 
wrote:
> > set enable_hashjoin=off;
> > set enable_mergejoin=off;
>
> [ ... ]
>
>
> > Now here the point to observe is that non-parallel case uses both less
> > Execution time and Planning time to complete the statement.  There
> > is a considerable increase in planning time without any benefit in
> > execution.
>
> So, you forced the query planner to give you a bad plan, and then
> you're complaining that the plan is bad?
>

Oh no, I wanted to check the behaviour of parallel vs. non-parallel
execution of joins.  I think even if hash and merge join are set to
off, it should have picked up non-parallel NestLoop plan.  In any case,
I have done some more investigation of the patch and found that even
without changing query planner related parameters, it seems to give
bad plans (as in example below [1]).  I think here the costing of rework
each
worker has to do seems to be missing which is causing bad plans to
be selected over good plans.  Another point is that with patch, the number
of
paths that we explore to get the cheapest path have increased, do you think
we should try to evaluate it?   One way is we run some queries where there
are more number of joins and see the impact on planner time and other is we
try to calculate the increase in number of paths that planner explores.


[1] -
CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 1000) g;

CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 300) g;

Analyze t1;
Analyze t2;

Non-parallel case

postgres=# Explain Analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 = t2.c1
AND t1.c1 BETWEEN 100 AND 200;
QUERY PLAN


-
-
 Aggregate  (cost=261519.93..261519.94 rows=1 width=0) (actual
time=2779.965..2779.965 rows=1 loops=1)
   ->  Hash Join  (cost=204052.91..261519.92 rows=1 width=0) (actual
time=2017.241..2779.947 rows=101
loops=1)
 Hash Cond: (t2.c1 = t1.c1)
 ->  Seq Scan on t2  (cost=0.00..46217.00 rows=300 width=4)
(actual time=0.073..393.479
rows=300 loops=1)
 ->  Hash  (cost=204052.90..204052.90 rows=1 width=4) (actual
time=2017.130..2017.130 rows=101
loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 12kB
   ->  Seq Scan on t1  (cost=0.00..204052.90 rows=1 width=4)
(actual time=0.038..2017.105
rows=101 loops=1)
 Filter: ((c1 >= 100) AND (c1 <= 200))
 Rows Removed by Filter: 899
 Planning time: 0.113 ms
 Execution time: 2780.000 ms
(11 rows)


Parallel-case
set max_parallel_degree=4;

postgres=# Explain Analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 = t2.c1
AND t1.c1 BETWEEN 100 AND 200;
QUERY PLAN

--
 Aggregate  (cost=100895.52..100895.53 rows=1 width=0) (actual
time=67871.443..67871.443 rows=1 loops=1)
   ->  Gather  (cost=1000.00..100895.52 rows=1 width=0) (actual
time=0.653..67871.287 rows=101 loops=1)
 Number of Workers: 4
 ->  Nested Loop  (cost=0.00..99895.42 rows=1 width=0) (actual
time=591.408..16455.731 rows=20 loops=5)
   Join Filter: (t1.c1 = t2.c1)
   Rows Removed by Join Filter: 60599980
   ->  Parallel Seq Scan on t1  (cost=0.00..45345.09 rows=0
width=4) (actual time=433.350..433.386 rows=20 loops=5)
 Filter: ((c1 >= 100) AND (c1 <= 200))
 Rows Removed by Filter: 180
   ->  Seq Scan on t2  (cost=0.00..46217.00 rows=300
width=4) (actual time=0.005..395.480 rows=300 loops=101)
 Planning time: 0.114 ms
 Execution time: 67871.584 ms
(12 rows)

Without patch, parallel case

set max_parallel_degree=4;

Explain Analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 = t2.c1 AND t1.c1
BETWEEN 100 AND 200;
  QUERY PLAN

--
 Aggregate  (cost=103812.21..103812.22 rows=1 width=0) (actual
time=1207.043..1207.043 rows=1 loops=1)
   ->  Hash Join  (cost=46345.20..103812.21 rows=1 width=0) (actual
time=428.632..1207.027 rows=101 loops=1)
 Hash Cond: (t2.c1 = t1.c1)
 ->  Seq Scan on t2  (cost=0.00..46217.00 rows=300 width=4)
(actual time=0.034..375.670 rows=300 loops=1)
 ->  Hash  (cost=46345.19..46345.19 rows=1 width=4) (actual
time=428.557..428.557 rows=101 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 13kB
   ->  Gather

Re: [HACKERS] Proposal: custom compression methods

2015-12-16 Thread Tomas Vondra


Hi,

On 12/13/2015 06:28 PM, Alexander Korotkov wrote:
>

Compression method of column would be stored in pg_attribute table.
Dependencies between columns and compression methods would be tracked in
pg_depend preventing dropping compression method which is currently in
use. Compression method of the attribute could be altered by ALTER TABLE
command.

ALTER TABLE table_name ALTER COLUMN column_name SET COMPRESSION METHOD
compname;


Do you plan to make this available in CREATE TABLE? For example 
Greenplum allows to specify COMPRESSTYPE/COMPRESSLEVEL per column.


What about compression levels? Do you plan to allow tweaking them? 
Tracking them would require another column in pg_attribute, probably.



Since mixing of different compression method in the same attribute
would be hard to manage (especially dependencies tracking), altering
attribute compression method would require a table rewrite.


I don't think the dependency tracking would be a big issue. The easiest 
we could do is simply track which columns used the compression type in 
the past, and scan them when removing it (the compression type).


I think the main obstacle to make this possible is the lack of free 
space in varlena header / need to add the ID of the compression method 
into the value.


FWIW I'd like to allow this (mixing compression types), but I don't 
think it's worth the complexity at this point. We can add that later, if 
it turns out to be a problem in practice (which it probably won't).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel joins, and better parallel explain

2015-12-16 Thread Dilip Kumar

On Wed, Dec 16, 2015 at 6:20 PM Amit Kapila  wrote:

>On Tue, Dec 15, 2015 at 7:31 PM, Robert Haas  wrote:
>>
>> On Mon, Dec 14, 2015 at 8:38 AM, Amit Kapila 
wrote:

> In any case,
>I have done some more investigation of the patch and found that even
>without changing query planner related parameters, it seems to give
>bad plans (as in example below [1]).  I think here the costing of rework
each

I have done some more testing using TPC-H benchmark (For some of the
queries, specially for Parallel Hash Join), and Results summary is as below.


*Planning Time(ms)*
*Query* *Base* *Patch* TPC-H Q2 2.2 2.4 TPCH- Q3 0.67 0.71 TPCH- Q5
3.17 2.3 TPCH-
Q7 2.43 2.4



*Execution Time(ms)*
*Query* *Base* *Patch* TPC-H Q2 2826 766 TPCH- Q3 23473 24271 TPCH- Q5 21357
1432 TPCH- Q7 6779 1138
All Test files and Detail plan output is attached in mail
q2.sql, q3.sql, q.5.sql ans q7.sql are TPCH benchmark' 2nd, 3rd, 5th and
7th query
and Results with base and Parallel join are attached in q*_base.out and
q*_parallel.out respectively.

Summary: With TPC-H queries where ever Hash Join is pushed under gather
Node, significant improvement is visible,
with Q2, using 3 workers, time consumed is almost 1/3 of the base.


I Observed one problem, with Q5 and Q7, there some relation and snapshot
references are leaked and i am getting below warning, havn't yet looked
into the issue.

WARNING:  relcache reference leak: relation "customer" not closed
WARNING:  relcache reference leak: relation "customer" not closed
WARNING:  relcache reference leak: relation "customer" not closed
WARNING:  Snapshot reference leak: Snapshot 0x2d1fee8 still referenced
WARNING:  Snapshot reference leak: Snapshot 0x2d1fee8 still referenced
WARNING:  Snapshot reference leak: Snapshot 0x2d1fee8 still referenced
WARNING:  relcache reference leak: relation "customer" not closed
CONTEXT:  parallel worker, PID 123413
WARNING:  Snapshot reference leak: Snapshot 0x2d1fee8 still referenced
CONTEXT:  parallel worker, PID 123413
WARNING:  relcache reference leak: relation "customer" not closed
CONTEXT:  parallel worker, PID 123412
WARNING:  Snapshot reference leak: Snapshot 0x2d1fee8 still referenced
CONTEXT:  parallel worker, PID 123412
WARNING:  relcache reference leak: relation "customer" not closed
CONTEXT:  parallel worker, PID 123411
WARNING:  Snapshot reference leak: Snapshot 0x2d1fee8 still referenced
CONTEXT:  parallel worker, PID 123411
psql:q7.sql:40: WARNING:  relcache reference leak: relation "customer" not
closed
psql:q7.sql:40: WARNING:  Snapshot reference leak: Snapshot 0x2d1fee8 still
referenced

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com








On Wed, Dec 16, 2015 at 6:19 PM, Amit Kapila 
wrote:

> On Tue, Dec 15, 2015 at 7:31 PM, Robert Haas 
> wrote:
> >
> > On Mon, Dec 14, 2015 at 8:38 AM, Amit Kapila 
> wrote:
> > > set enable_hashjoin=off;
> > > set enable_mergejoin=off;
> >
> > [ ... ]
> >
> >
> > > Now here the point to observe is that non-parallel case uses both less
> > > Execution time and Planning time to complete the statement.  There
> > > is a considerable increase in planning time without any benefit in
> > > execution.
> >
> > So, you forced the query planner to give you a bad plan, and then
> > you're complaining that the plan is bad?
> >
>
> Oh no, I wanted to check the behaviour of parallel vs. non-parallel
> execution of joins.  I think even if hash and merge join are set to
> off, it should have picked up non-parallel NestLoop plan.  In any case,
> I have done some more investigation of the patch and found that even
> without changing query planner related parameters, it seems to give
> bad plans (as in example below [1]).  I think here the costing of rework
> each
> worker has to do seems to be missing which is causing bad plans to
> be selected over good plans.  Another point is that with patch, the number
> of
> paths that we explore to get the cheapest path have increased, do you think
> we should try to evaluate it?   One way is we run some queries where there
> are more number of joins and see the impact on planner time and other is we
> try to calculate the increase in number of paths that planner explores.
>
>
> [1] -
> CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
> generate_series(1, 1000) g;
>
> CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
> generate_series(1, 300) g;
>
> Analyze t1;
> Analyze t2;
>
> Non-parallel case
>
> postgres=# Explain Analyze SELECT count(*) FROM t1 JOIN t2 ON t1.c1 =
> t2.c1 AND t1.c1 BETWEEN 100 AND 200;
> QUERY PLAN
>
>
>
> -
> -
>  Aggregate  (cost=261519.93..261519.94 rows=1 width=0) (actual
> time=2779.965..2779.965

Re: [HACKERS] Performance improvement for joins where outer side is unique

2015-12-16 Thread Tomas Vondra


Hi,

On 12/16/2015 01:27 AM, David Rowley wrote:


I've attached a rebased patch against current master as there were
some conflicts from the recent changes to LATERAL join.


Thanks. I've looked at the rebased patch and have a few minor comments.

0) I know the patch does not tweak costing - any plans in this
   direction? Would it be possible to simply use the costing used by
   semijoin?

1) nodeHashjoin.c (and other join nodes)

I've noticed we have this in the ExecHashJoin() method:

 /*
  * When the inner side is unique or we're performing a
  * semijoin, we'll consider returning the first match, but
  * after that we're done with this outer tuple.
  */
 if (node->js.unique_inner)
 node->hj_JoinState = HJ_NEED_NEW_OUTER;

That seems a bit awkward because the comment speaks about unique joins 
*OR* semijoins, but the check only references unique_inner. That of 
course works because we do this in ExecInitHashJoin():


switch (node->join.jointype)
{
case JOIN_SEMI:
hjstate->js.unique_inner = true;
/* fall through */

Either some semijoins may not be unique - in that case the naming is 
misleading at best, and we either need to find a better name or simply 
check two conditions like this:


 if (node->js.unique_inner || node->join.type == JOIN_SEMI)
node->hj_JoinState = HJ_NEED_NEW_OUTER;

Or all semijoins are unique joins, and in that case the comment may need 
rephrasing. But more importantly, it begs the question why we're 
detecting this in the executor and not in the planner? Because if we 
detect it in executor, we can't use this bit in costing, for example.


FWIW the misleading naming was actually mentioned in the thread by 
Horiguchi-san, but I don't see any response to that (might have missed 
it though).


2) analyzejoins.c

I see that this code in join_is_removable()

innerrel = find_base_rel(root, innerrelid);

if (innerrel->reloptkind != RELOPT_BASEREL)
return false;

was rewritten like this:

innerrel = find_base_rel(root, innerrelid);

Assert(innerrel->reloptkind == RELOPT_BASEREL);

That suggests that previously the function expected cases where 
reloptkind may not be RELOPT_BASEREL, but now it'll error out int such 
cases. I haven't noticed any changes in the surrounding code that'd 
guarantee this won't happen, but I also haven't been able to come up 
with an example triggering the assert (haven't been trying too hard). 
How do we know the assert() makes sense?



3) joinpath.c

- either "were" or "been" seems redundant

/* left joins were already been analyzed for uniqueness in 
mark_unique_joins() */



4) analyzejoins.c

comment format broken

/*
 * mark_unique_joins
Analyze joins in order to determine if their inner side is unique based
on the join condition.
 */

5) analyzejoins.c

missing line before the comment

}
/*
 * rel_supports_distinctness
 *  Returns true if rel has some properties which can prove the 
relation
 *  to be unique over some set of columns.
 *

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Tom Lane

Andreas Karlsson  writes:
> Did you manage to figure out why one was better than the other? The 
> differences between the functions seem rather subtle.

I'm a bit suspicious of Merlin's recommendation as well.  Looking at
the readline 6.3 sources, it is rl_resize_terminal() not the other
one that is called when an actual SIGWINCH is handled.

Another issue is that rl_reset_screen_size() doesn't exist in older copies
of readline (I don't see it in 4.0 nor 4.2a, which is what I've got laying
around in my archives).  So even if we agree that that's the one to use
when available, we'd need configure logic to fall back to
rl_resize_terminal() otherwise.

> rl_resize_terminal()
> Respects COLUMNS and ROWS and also has some logic when echo mode is 
> turned on which I have not managed to understand yet.

It looks to me like what it's doing is repainting the current line
on the theory that it might be messed up.  Since we are, at this
point, presumably *not* in the middle of accepting a command line,
that should be unnecessary but also harmless.  If there is any
visible difference in behavior, I should think it would be
rl_resize_terminal() that produces more consistent results, because
it does have the repaint logic which the other lacks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Andres Freund

Hi,

First off: Glad that you investigated, this has been bugging me.

On 2015-12-14 15:57:22 -0600, Merlin Moncure wrote:
> Also, after some experimentation I had better luck with
> rl_reset_screen_size() (vs rl_resize_terminal()) that seemed to give
> more regular behavior with the prompt.  So the consolidated patch with
> the configure check is attached.

Hm. rl_reset_screen_size() works well for me. rl_resize_terminal() not
so much.  Apparently the reason for that is that rl_reset_screen_size()
doesn't set ignore_env to to true when calling
_rl_get_screen_size(). I've verified that just toggling that parameter
changes the behaviour.

What bugs me about that a bit is that _rl_sigwinch_resize_terminal()
sets ignore_env to true.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix for readline terminal size problems when window is resized with open pager

2015-12-16 Thread Andres Freund

On 2015-12-16 11:21:53 -0500, Tom Lane wrote:
> It looks to me like what it's doing is repainting the current line
> on the theory that it might be messed up.  Since we are, at this
> point, presumably *not* in the middle of accepting a command line,
> that should be unnecessary but also harmless.  If there is any
> visible difference in behavior, I should think it would be
> rl_resize_terminal() that produces more consistent results, because
> it does have the repaint logic which the other lacks.

The repainting actually causes trouble here, if I have \timing enabled.
Even without an actual resize even it leads to the Time: line to be
displayed in the same line as the query, after the pager was
active. Looks like
postgres[8444][1]=# SELECT * .. \n ... \n
 UNION ALL
 SELECT 1;Time: 2.311 ms

So something isn't entirely right after a redraw...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Experimental evaluation of PostgreSQL's query optimizer

2015-12-16 Thread Viktor Leis

Hi,

We have recently performed an experimental evaluation of PostgreSQL's
query optimizer. For example, we measured the contributions of
cardinality estimation and the cost model on the overall query
performance. You can download the resulting paper here:
http://www.vldb.org/pvldb/vol9/p204-leis.pdf

Some findings:
1. Perhaps unsurprisingly, we found that cardinality
estimation is the biggest problem in query optimization.
2. The quality of Postgres' cardinality estimates is not generally worse
than that of the major commerical systems.
3. It seems to me that one obvious way to avoid many bad situations
would be to disable nested loop joins when the inner relation is NOT
an index scan.

I hope this will be of interest to some of you.

--

Viktor Leis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] A typo in syncrep.c

2015-12-16 Thread Kyotaro HORIGUCHI

Hello, I think I found a typo in a comment of syncrep.c.

>   * acknowledge the commit nor raise ERROR or FATAL.  The latter would
> - * lead the client to believe that that the transaction aborted, which
>   * is not true: it's already committed locally. The former is no good

The 'that' looks duplicate. And it might be better to put a
be-verb before the 'aborted'.

> + * lead the client to believe that the transaction is aborted, which

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 6eb92736d318f6be35459d4406585aa34233bd5d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 16 Dec 2015 16:52:19 +0900
Subject: [PATCH] fix typo of syncrep.c

---
 src/backend/replication/syncrep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 325239d..98f7d73 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -183,7 +183,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		/*
 		 * If a wait for synchronous replication is pending, we can neither
 		 * acknowledge the commit nor raise ERROR or FATAL.  The latter would
-		 * lead the client to believe that that the transaction aborted, which
+		 * lead the client to believe that the transaction is aborted, which
 		 * is not true: it's already committed locally. The former is no good
 		 * either: the client has requested synchronous replication, and is
 		 * entitled to assume that an acknowledged commit is also replicated,
-- 
1.8.3.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-16 Thread Haribabu Kommi

On Wed, Dec 16, 2015 at 8:19 AM, Tomas Vondra
 wrote:
> Hi,
>
> I've reviewed the patch today, after re-reading the whole discussion.

Thanks for the review.

> The one unsolved issue is what to do with 1e24cf64. My understanding is that
> the current patch still requires reverting of that patch, but my feeling is
> TL won't be particularly keen about doing that. Or am I missing something?

Until pg_hba_lookup function, the parsed hba lines are not used in the backend.
These are used only postmaster process for the authentication. As the parsed
hba lines occupy extra memory in the backend process which is of no use.
Because of this reason TL has changed it to PostmasterContext instead of
TopMemoryContext.

> The current patch (v6) also triggers a few warnings during compilation,
> about hostname/address being unitialized in pg_hba_lookup(). That happens
> because 'address' is only set when (! PG_ARGISNULL(2)). Fixing it is as
> simple as
>
> char*address = NULL;
> char*hostname = NULL;
>
> at the beginning of the function (this seems correct to me).

corrected.

> The current patch also does not handle 'all' keywords correctly - it
> apparently just compares the values as strings, which is incorrect. For
> example this
>
> SELECT * FROM pg_hba_lookup('all', 'all')
>
> matches this pg_hba.conf line
>
> localallalltrust
>
> That's clearly incorrect, as Alvaro pointed out.

In the above case, the 'all' is taken as a database and user names.
The pg_hba line contains the keyword of 'all' as database and user.
This line can match with any database and user names provided
by the user. Because of this reason, it matches with the first line
of pg_hba.conf.

I feel it is fine. Please let me know if you are expecting a different
behavior.

> I'm also wondering whether we really need three separate functions in
> pg_proc.
>
> pg_hba_lookup(database, user)
> pg_hba_lookup(database, user, address)
> pg_hba_lookup(database, user, address, ssl_inuse)
>
> Clearly, that's designed to match the local/host/hostssl/hostnossl cases
> available in pg_hba. But why not to simply use default values instead?
>
> pg_hba_lookup(database TEXT, user TEXT,
>   address TEXT DEFAULT NULL,
>   ssl_inuse BOOLEAN DEFAULT NULL)
>

Function is changed to accept default values.

Apart from the above, added a local memory context to allocate the memory
required for forming tuple for each line. This context resets for every hba line
to avoid consuming unnecessary memory for scenarios of huge pg_hba.conf
files.

In the revert_hba_context_release_in_backend patch, apart from reverting
the commit - 1e24cf64. In tokenize_file function, changed the new context
allocation from CurrentMemoryContext instead of TopMemoryContext.

Patch apply process:
1. revert_hba_context_release_in_backend_2.patch
2. pg_hba_lookup_poc_v7.patch

Regards,
Hari Babu
Fujitsu Australia

revert_hba_context_release_in_backend_2.patch
Description: Binary data

pg_hba_lookup_poc_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing initially_valid values instead of !skip_validation to StoreRelCheck() in AddRelationNewConstraints()

2015-12-16 Thread amul sul

Updated patch to add this table creation case in regression tests.
PFA patch version V3.

 
Regards,
Amul Sul

transformCheckConstraints-function-to-overrid-not-valid_V3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Experimental evaluation of PostgreSQL's query optimizer

2015-12-16 Thread Simon Riggs

On 16 December 2015 at 09:51, Viktor Leis  wrote:

> Hi,
>
> We have recently performed an experimental evaluation of PostgreSQL's
> query optimizer. For example, we measured the contributions of
> cardinality estimation and the cost model on the overall query
> performance. You can download the resulting paper here:
> http://www.vldb.org/pvldb/vol9/p204-leis.pdf


Thank you, an excellent contribution.


> Some findings:
> 1. Perhaps unsurprisingly, we found that cardinality
> estimation is the biggest problem in query optimization.
> 2. The quality of Postgres' cardinality estimates is not generally worse
> than that of the major commerical systems.
>

Good to hear, though we have found problems there that alter plan selection
adversely for TPC-H with the current optimizer. We have some improvements
which may be in the next release.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] pg_stat_replication log positions vs base backups

2015-12-16 Thread Magnus Hagander

On Wed, Dec 16, 2015 at 8:34 AM, Michael Paquier 
wrote:

> On Mon, Dec 14, 2015 at 8:59 AM, Michael Paquier
>  wrote:
> > On Mon, Dec 14, 2015 at 1:01 AM, Magnus Hagander 
> wrote:
> >> I've applied these two patches now.
> >>
> >> The one that fixes the initialization backpatched to 9.3 which is the
> oldest
> >> one that has it, and the one that changes the actual 0-vs-NULL output
> to 9.5
> >> only as it's a behaviour change.
> >
> > Thanks!
>
> Interesting. I got just today a bug report that is actually a symptom
> that people should be careful about: it is possible that
> pg_stat_replication reports 1/potential for sync_priority/sync_state
> in the case of a WAL sender in "backup" state: a base backup just
> needs to reuse the shared memory slot of a standby that was previously
> connected. Commit 61c7bee of Magnus fixes the issue, just let's be
> careful if there are similar reports that do not include this fix.
>

Hmm. With the fix, it returns "async", right?

Perhaps it should return either "backup" or NULL, to be even more clear?
And with priority set to NULL?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] pgbench stats per script & other stuff

2015-12-16 Thread Fabien COELHO




It seems also that it would be a good idea to split the patch into two
parts:
1) Refactor the code so as the existing test scripts are put under the
same umbrella with addScript, adding at the same time the new option
-b.
2) Add the weight facility and its related statistics.


Sigh. The patch & documentation are probably not independent, so that would
make two dependent patches, probably.


I am not really saying so, it seems just that doing the refactoring
(with its related docs), and then add the extension for the weight
(with its docs) is more natural than doing both things at the same
time.


Ok. I can separate the refactoring (scripts & stats) and the weight stuff 
on top of the refactoring.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

86 matches

Mail list logo